Skip to content

July 2025 - Platform Foundation & Initial Architecture

Context

Beginning of R&D work on Rose, an agentic platform for inbound marketing - a new category in B2B MarTech. Initial goal: create an intelligent agent that can engage website visitors, qualify leads, and answer complex B2B questions using company knowledge.

First client: AB Tasty (A/B testing SaaS platform).

Key Business Metrics Tracked: 1. Initial Engagement Rate: % of visitors who interact with widget 2. Conversation Depth: Average number of turns per conversation 3. Conversion Rate: % of conversations leading to demo/email capture

Technical Challenge

Primary Problem: How to build a production-grade RAG system that: 1. Supports multiple B2B clients with isolated knowledge bases 2. Delivers sub-3-second response times 3. Maintains conversation context across sessions 4. Integrates with existing website infrastructure

State of the Art: - LangChain provided basic RAG patterns but with sequential execution - LightRAG combines knowledge graph (Neo4j) with vector RAG for better retrieval through graph relationships - Gap: LightRAG does NOT support multi-tenant isolation natively (neither Neo4j nor MongoDB) - Standard chatbot frameworks lacked B2B-specific features (visitor identification, lead qualification)


Hypothesis 1: Standard LangChain RAG with Monolithic Prompt

"A standard LangChain RetrievalQA chain with OpenAI embeddings and a monolithic prompt will meet B2B chatbot requirements."

Development Work

Initial Architecture: - Basic RetrievalQA chain with OpenAI embeddings - MongoDB for vector storage - Simple conversation memory - Single monolithic prompt handling all conversation types

Implementation Details: - Created backend folder hierarchy and package structure - Refactored chatbot context handling to utilize modern LangChain patterns - Improved memory handling for conversation continuity - Implemented proper system prompt structure

Testing & Validation

Method: Direct client feedback from AB Tasty pilot (production deployment) - Real B2B questions from pilot users - Measured: response quality, latency, context retention - Critical: Real client testing instrumental to identify infrastructure gaps

Results

Metric Result Target
Avg Conversation Turns ~1.5 >3
Response Latency ~7s <3s
Context Retention Poor Good

Infrastructure Issue: No proper vector database - using hard-coded key-vector store in Docker image. This caused the 7s latency.

Conclusion

FAILURE: Standard LangChain insufficient for B2B complexity. Hard-coded vector store caused unacceptable latency (~7s). Monolithic prompt couldn't adapt to different user intents. No tenant isolation. Real production testing revealed need for proper database infrastructure.


Hypothesis 2: LightRAG with Graph Database

"LightRAG with Neo4j graph storage will provide better retrieval quality through knowledge graph relationships."

Development Work

Architecture Evolution: - Introduced ixrag package for RAG functionality - Created custom LightRAG integration with retriever component - Enhanced logging for tenant-based isolation setup - Added LangGraph version of the chatbot for better orchestration

Key Components Created: - Custom LightRAG retriever with enhanced logging - Document processing pipeline for RAG - Integration tests for LightRAG functionality

Testing & Validation

Method: Integration tests + AB Tasty feedback - Compared retrieval quality vs vanilla LangChain - Tested with AB Tasty technical documentation

Results

Metric LangChain LightRAG
Retrieval Relevance 60% 78%
Response Quality 45% 62%
Graph Relationships None Working

Conclusion

PARTIAL SUCCESS: LightRAG improved retrieval quality but introduced new challenge - no native multi-tenant support in LightRAG.


Hypothesis 3: Cypher Query Interception for Multi-Tenant Isolation

"Can we inject tenant isolation into LightRAG without modifying upstream source code, by intercepting and rewriting Cypher queries at runtime?"

Technical Uncertainty

Problem: LightRAG does not support multi-tenancy natively. We needed to isolate data between clients (AB Tasty, future clients).

Why not fork LightRAG? Maintaining a fork would create long-term maintenance burden and prevent upstream updates.

Experimental approach: Intercept all Cypher queries at runtime and inject tenant filtering. Uncertainty: - Would query interception work for ALL query patterns LightRAG generates? - Could we handle dynamic query structures (multiple node variables, varying WHERE clauses)? - Would performance remain acceptable with query rewriting overhead?

Development Work

Solution Architecture:

LightRAG (upstream, unmodified) → TenantAwareNeo4JStorage (intercept) → Neo4j
                                  Runtime Cypher query rewriting
                                  Inject tenantId into all queries

Experimental Implementation: - Cypher query parser to identify node variables and WHERE clauses - Dynamic query rewriting with IS NOT NULL tenant ID checks - Support for multiple node variables in complex queries - Edge case handling for queries without existing WHERE clauses

Testing & Validation

Method: Unit tests + integration tests with multiple tenants - Created test suite covering different query patterns - Tested concurrent access from different tenants - Verified no data leakage between tenants - Stress-tested query rewriting performance

Results

Metric Before After
Tenant Isolation None Working
Query Performance N/A <100ms overhead
Data Leakage Possible None detected
Query Pattern Coverage N/A ~95% (edge cases identified)

Conclusion

PARTIAL SUCCESS: Query interception approach validated. Works for majority of LightRAG query patterns. Identified edge cases requiring more comprehensive Cypher parsing (addressed in later months). 795 lines of experimental code.


Hypothesis 4: Azure OpenAI Provider Optimization

"Azure OpenAI will provide better enterprise reliability and performance compared to direct OpenAI API for production workloads."

Technical Uncertainty

Problem: Initial architecture used direct OpenAI API calls, raising concerns about: 1. Enterprise reliability and SLA guarantees 2. Regional latency optimization (Azure offers regional deployments) 3. Cost predictability and enterprise billing integration 4. Security compliance (Azure offers private endpoints, VNET integration)

Key Question: Would switching to Azure OpenAI maintain response quality while improving reliability metrics?

Development Work

Provider Abstraction Layer: - Introduced get_llm() factory function to abstract provider selection - Replaced hardcoded ChatOpenAI instances with configurable AzureChatOpenAI - Environment-based configuration for Azure credentials and deployment names - Support for model switching between providers without code changes

Implementation (commit e6447eb, July 10):

def get_llm(model_name: str):
    """Get an LLM instance for a given model name."""
    return AzureChatOpenAI(
        model=model_name,
        deployment_name=ensure_env("AZURE_OPENAI_DEPLOYMENT_NAME"),
        azure_endpoint=ensure_env("AZURE_OPENAI_ENDPOINT"),
        api_key=ensure_env("AZURE_OPENAI_API_KEY"),
        api_version=ensure_env("AZURE_OPENAI_API_VERSION"),
        temperature=0.7,
    )

Configuration Extension (commit 96c6181, July 21):

[lightrag]
retrieval_model = "gpt-4.1-nano"
processing_model = "gpt-4.1-mini"
provider = "azure"

[chat]
model = "gpt-4.1-mini"

Testing & Validation

Method: A/B comparison between OpenAI and Azure OpenAI - Response quality comparison (same prompts, same context) - Latency measurements across different times of day - Error rate tracking during pilot deployments

Results

Metric OpenAI Direct Azure OpenAI
Response Quality Baseline Equivalent
Avg Latency (EU) ~800ms ~650ms
Error Rate (5xx) 0.3% 0.1%
SLA Guarantee None 99.9%
Enterprise Features Limited Full (VNET, Private Endpoints)

Conclusion

SUCCESS: Azure OpenAI provides equivalent response quality with better enterprise reliability. Provider abstraction layer allows easy switching and future experimentation with other providers (Groq, Google, Cerebras tested in comments).

R&D Significance: This work established the provider abstraction pattern that enables systematic experimentation with different LLM providers without code changes - foundational for future optimization work.


Additional Development

Frontend Widget Creation

  • Comprehensive inline loader and widget lifecycle manager
  • Multi-instance management and enhanced configuration
  • Domain validation and debug logging
  • Testing interface (preprod-ui) for rapid development iteration

Observability Setup

  • LangFuse integration for LLM tracing
  • Basic dataset evaluation framework

Streaming Implementation

  • Real-time response streaming for better UX

R&D Activities

  • Architecture exploration and hypothesis testing
  • LangChain prototypes and experiments
  • LightRAG integration (novel RAG approach)
  • Cypher query interception for multi-tenant isolation (experimental)
  • Azure OpenAI provider integration and optimization (provider abstraction layer)
  • Observability setup (LangFuse) - foundation for metrics-driven iteration
  • Dataset evaluation framework

Other Development

  • Frontend widget creation
  • Streaming implementation
  • Testing & validation

Next Work (August)

  1. Validate multi-tenant isolation in production
  2. Address LightRAG API instability across versions
  3. Implement Redis-based conversation history
  4. Scale to additional clients (Mayday)