July 2025 - Platform Foundation & Initial Architecture¶

Context¶

Beginning of R&D work on Rose, an agentic platform for inbound marketing - a new category in B2B MarTech. Initial goal: create an intelligent agent that can engage website visitors, qualify leads, and answer complex B2B questions using company knowledge.

First client: AB Tasty (A/B testing SaaS platform).

Key Business Metrics Tracked: 1. Initial Engagement Rate: % of visitors who interact with widget 2. Conversation Depth: Average number of turns per conversation 3. Conversion Rate: % of conversations leading to demo/email capture

Technical Challenge¶

Primary Problem: How to build a production-grade RAG system that: 1. Supports multiple B2B clients with isolated knowledge bases 2. Delivers sub-3-second response times 3. Maintains conversation context across sessions 4. Integrates with existing website infrastructure

State of the Art: - LangChain provided basic RAG patterns but with sequential execution - LightRAG combines knowledge graph (Neo4j) with vector RAG for better retrieval through graph relationships - Gap: LightRAG does NOT support multi-tenant isolation natively (neither Neo4j nor MongoDB) - Standard chatbot frameworks lacked B2B-specific features (visitor identification, lead qualification)

Hypothesis 1: Standard LangChain RAG with Monolithic Prompt¶

"A standard LangChain RetrievalQA chain with OpenAI embeddings and a monolithic prompt will meet B2B chatbot requirements."

Development Work¶

Initial Architecture: - Basic RetrievalQA chain with OpenAI embeddings - MongoDB for vector storage - Simple conversation memory - Single monolithic prompt handling all conversation types

Implementation Details: - Created backend folder hierarchy and package structure - Refactored chatbot context handling to utilize modern LangChain patterns - Improved memory handling for conversation continuity - Implemented proper system prompt structure

Testing & Validation¶

Method: Direct client feedback from AB Tasty pilot (production deployment) - Real B2B questions from pilot users - Measured: response quality, latency, context retention - Critical: Real client testing instrumental to identify infrastructure gaps

Results¶

Metric	Result	Target
Avg Conversation Turns	~1.5	>3
Response Latency	~7s	<3s
Context Retention	Poor	Good

Infrastructure Issue: No proper vector database - using hard-coded key-vector store in Docker image. This caused the 7s latency.

Conclusion¶

FAILURE: Standard LangChain insufficient for B2B complexity. Hard-coded vector store caused unacceptable latency (~7s). Monolithic prompt couldn't adapt to different user intents. No tenant isolation. Real production testing revealed need for proper database infrastructure.

Hypothesis 2: LightRAG with Graph Database¶

"LightRAG with Neo4j graph storage will provide better retrieval quality through knowledge graph relationships."

Development Work¶

Architecture Evolution: - Introduced ixrag package for RAG functionality - Created custom LightRAG integration with retriever component - Enhanced logging for tenant-based isolation setup - Added LangGraph version of the chatbot for better orchestration

Key Components Created: - Custom LightRAG retriever with enhanced logging - Document processing pipeline for RAG - Integration tests for LightRAG functionality

Testing & Validation¶

Method: Integration tests + AB Tasty feedback - Compared retrieval quality vs vanilla LangChain - Tested with AB Tasty technical documentation

Results¶

Metric	LangChain	LightRAG
Retrieval Relevance	60%	78%
Response Quality	45%	62%
Graph Relationships	None	Working

Conclusion¶

PARTIAL SUCCESS: LightRAG improved retrieval quality but introduced new challenge - no native multi-tenant support in LightRAG.

Hypothesis 3: Cypher Query Interception for Multi-Tenant Isolation¶

"Can we inject tenant isolation into LightRAG without modifying upstream source code, by intercepting and rewriting Cypher queries at runtime?"

Technical Uncertainty¶

Problem: LightRAG does not support multi-tenancy natively. We needed to isolate data between clients (AB Tasty, future clients).

Why not fork LightRAG? Maintaining a fork would create long-term maintenance burden and prevent upstream updates.

Experimental approach: Intercept all Cypher queries at runtime and inject tenant filtering. Uncertainty: - Would query interception work for ALL query patterns LightRAG generates? - Could we handle dynamic query structures (multiple node variables, varying WHERE clauses)? - Would performance remain acceptable with query rewriting overhead?

Development Work¶

Solution Architecture:

LightRAG (upstream, unmodified) → TenantAwareNeo4JStorage (intercept) → Neo4j
                                          ↓
                                  Runtime Cypher query rewriting
                                  Inject tenantId into all queries

Experimental Implementation: - Cypher query parser to identify node variables and WHERE clauses - Dynamic query rewriting with IS NOT NULL tenant ID checks - Support for multiple node variables in complex queries - Edge case handling for queries without existing WHERE clauses

Testing & Validation¶

Method: Unit tests + integration tests with multiple tenants - Created test suite covering different query patterns - Tested concurrent access from different tenants - Verified no data leakage between tenants - Stress-tested query rewriting performance

Results¶

Metric	Before	After
Tenant Isolation	None	Working
Query Performance	N/A	<100ms overhead
Data Leakage	Possible	None detected
Query Pattern Coverage	N/A	~95% (edge cases identified)

Conclusion¶

PARTIAL SUCCESS: Query interception approach validated. Works for majority of LightRAG query patterns. Identified edge cases requiring more comprehensive Cypher parsing (addressed in later months). 795 lines of experimental code.

Hypothesis 4: Azure OpenAI Provider Optimization¶

"Azure OpenAI will provide better enterprise reliability and performance compared to direct OpenAI API for production workloads."

Technical Uncertainty¶

Problem: Initial architecture used direct OpenAI API calls, raising concerns about: 1. Enterprise reliability and SLA guarantees 2. Regional latency optimization (Azure offers regional deployments) 3. Cost predictability and enterprise billing integration 4. Security compliance (Azure offers private endpoints, VNET integration)

Key Question: Would switching to Azure OpenAI maintain response quality while improving reliability metrics?

Development Work¶

Provider Abstraction Layer: - Introduced get_llm() factory function to abstract provider selection - Replaced hardcoded ChatOpenAI instances with configurable AzureChatOpenAI - Environment-based configuration for Azure credentials and deployment names - Support for model switching between providers without code changes

Implementation (commit e6447eb, July 10):

def get_llm(model_name: str):
    """Get an LLM instance for a given model name."""
    return AzureChatOpenAI(
        model=model_name,
        deployment_name=ensure_env("AZURE_OPENAI_DEPLOYMENT_NAME"),
        azure_endpoint=ensure_env("AZURE_OPENAI_ENDPOINT"),
        api_key=ensure_env("AZURE_OPENAI_API_KEY"),
        api_version=ensure_env("AZURE_OPENAI_API_VERSION"),
        temperature=0.7,
    )

Configuration Extension (commit 96c6181, July 21):

[lightrag]
retrieval_model = "gpt-4.1-nano"
processing_model = "gpt-4.1-mini"
provider = "azure"

[chat]
model = "gpt-4.1-mini"

Testing & Validation¶

Method: A/B comparison between OpenAI and Azure OpenAI - Response quality comparison (same prompts, same context) - Latency measurements across different times of day - Error rate tracking during pilot deployments

Results¶

Metric	OpenAI Direct	Azure OpenAI
Response Quality	Baseline	Equivalent
Avg Latency (EU)	~800ms	~650ms
Error Rate (5xx)	0.3%	0.1%
SLA Guarantee	None	99.9%
Enterprise Features	Limited	Full (VNET, Private Endpoints)

Conclusion¶

SUCCESS: Azure OpenAI provides equivalent response quality with better enterprise reliability. Provider abstraction layer allows easy switching and future experimentation with other providers (Groq, Google, Cerebras tested in comments).

R&D Significance: This work established the provider abstraction pattern that enables systematic experimentation with different LLM providers without code changes - foundational for future optimization work.

Additional Development¶

Comprehensive inline loader and widget lifecycle manager
Multi-instance management and enhanced configuration
Domain validation and debug logging
Testing interface (preprod-ui) for rapid development iteration

Observability Setup¶

LangFuse integration for LLM tracing
Basic dataset evaluation framework

Streaming Implementation¶

Real-time response streaming for better UX

R&D Activities¶

Architecture exploration and hypothesis testing
LangChain prototypes and experiments
LightRAG integration (novel RAG approach)
Cypher query interception for multi-tenant isolation (experimental)
Azure OpenAI provider integration and optimization (provider abstraction layer)
Observability setup (LangFuse) - foundation for metrics-driven iteration
Dataset evaluation framework

Other Development¶

Frontend widget creation
Streaming implementation
Testing & validation

Next Work (August)¶

Validate multi-tenant isolation in production
Address LightRAG API instability across versions
Implement Redis-based conversation history
Scale to additional clients (Mayday)

July 2025 - Platform Foundation & Initial Architecture¶

Context¶

Technical Challenge¶

Hypothesis 1: Standard LangChain RAG with Monolithic Prompt¶

Development Work¶

Testing & Validation¶

Results¶

Conclusion¶

Hypothesis 2: LightRAG with Graph Database¶

Development Work¶

Testing & Validation¶

Results¶

Conclusion¶

Hypothesis 3: Cypher Query Interception for Multi-Tenant Isolation¶

Technical Uncertainty¶

Development Work¶

Testing & Validation¶

Results¶

Conclusion¶

Hypothesis 4: Azure OpenAI Provider Optimization¶

Technical Uncertainty¶

Development Work¶

Testing & Validation¶

Results¶

Conclusion¶

Additional Development¶

Frontend Widget Creation¶

Observability Setup¶

Streaming Implementation¶

R&D Activities¶

Other Development¶

Next Work (August)¶