IXChat Package¶
LangGraph-based chatbot with retrieval-augmented generation (RAG), conversation memory, and intelligent visitor enrichment.
Overview¶
The ixchat package provides the core chatbot functionality for the Rose platform. It uses LangGraph to orchestrate a complex workflow of specialized nodes that handle:
- Document Retrieval: Fetches relevant context from LightRAG
- Visitor Enrichment: Identifies companies from IP addresses
- Response Generation: Produces contextual answers with LLM
- Suggestion Generation: Creates follow-up questions or answer options
- Dialog Supervision: Tracks conversation state and signals
Architecture Diagram¶
The following diagram shows the LangGraph structure. It is auto-generated from the graph definition during just build or just dev.
Multi-Agent Router Architecture (WIP)¶
Partial Implementation
This architecture is partially implemented. Only the redirect handler agent is active, and only in test/development environments. Production uses the legacy legacy_answer_writer with monolithic prompts.
See ADR: Prompt Modularization for the full design.
Overview¶
The multi-agent router architecture replaces the monolithic prompt approach with specialized agents for different visitor intents. Instead of one large prompt handling all scenarios, the system:
- Classifies intent using a fast LLM (gpt-4.1-nano)
- Routes to specialized agents based on intent + interest signals
- Uses 3-level prompt hierarchy for each agent (meta-template → agent template → client instructions)
Intent Classification¶
The intent_classifier node classifies each message into one of 5 visitor intents:
| Intent | Description | Example |
|---|---|---|
LEARN |
Product questions, feature inquiries | "How does your A/B testing work?" |
CONTEXT |
User sharing business context | "We have 50k monthly visitors" |
SUPPORT |
Existing customer issues | "I can't log into my dashboard" |
OFFTOPIC |
Unrelated to product | "What's the weather today?" |
OTHER |
Job inquiries, press, partnerships | "Are you hiring?" |
The classifier runs in parallel with other background nodes from START, using the Langfuse prompt rose-internal-intent-router.
Action Routing¶
The action_router node uses deterministic logic (no LLM) to decide the next action based on:
- Current visitor intent
- Cumulative interest score (from
interest_signals_detector) - Site-specific interest threshold (from
agent_configtable)
| Action | Trigger | Handler |
|---|---|---|
EDUCATE |
LEARN intent | legacy_answer (planned: educator agent) |
QUALIFY |
CONTEXT intent | legacy_answer (planned: qualifier agent) |
PROPOSE_DEMO |
Qualified + buying signals | legacy_answer (planned: CTA agent) |
HANDLE_SUPPORT |
SUPPORT intent | redirect_handler ✅ |
HANDLE_OFFTOPIC |
OFFTOPIC intent | redirect_handler ✅ |
HANDLE_OTHER |
OTHER intent | redirect_handler ✅ |
Redirect Handler¶
The redirect_handler is the only specialized agent currently implemented. It handles support, off-topic, and other requests by redirecting users to appropriate resources.
3-Level Prompt Hierarchy:
rose-internal/response-agents/meta-template (Level 1 - shared)
└── {{lf_agent_instructions}} ← Agent template inserted
└── rose-internal/response-agents/redirect/template (Level 2)
└── {{lf_client_agent_instructions}} ← Client instructions inserted
└── rose-internal/response-agents/redirect/instructions/{domain} (Level 3)
Key features:
- Skips RAG retrieval: The redirect handler cancels the retrieval task for faster responses since it doesn't need knowledge base content. This saves 10-30% retrieval costs on redirect cases.
- Uses gpt-4.1-nano: Optimized for fast, lightweight redirect responses.
Environment-Based Routing¶
| Environment | Support/Offtopic/Other | Educate/Qualify/Demo |
|---|---|---|
| Production | legacy_answer |
legacy_answer |
| Test/Development | redirect_handler ✅ |
legacy_answer |
Current Limitations¶
- Educator agent: Not implemented (routes to
legacy_answer) - Qualifier agent: Not implemented (routes to
legacy_answer) - CTA/Demo agent: Not implemented (routes to
legacy_answer) - A/B testing: No framework for comparing router vs monolithic performance
Deferred Retrieval Pattern¶
The graph uses a deferred retrieval pattern to optimize performance and reduce costs:
retrieval_task_starterfires the retrieval task at START without waiting (fire-and-forget)action_routerdecides the path based on intent + signals (doesn't need retrieval results)- Answer path:
retrieval_awaiterawaits the deferred task beforelegacy_answer_writer - Redirect path:
redirect_handlercancels the retrieval task (saves 10-30% retrieval costs)
This pattern ensures retrieval only happens when needed (answer path), avoiding wasted work on redirects.
Parallel Execution and Race Conditions¶
Five nodes start in parallel from START:
START ──┬── retrieval_task_starter ──→ END (fires async task, doesn't wait)
├── third_party_enricher ──→ END (background)
├── visitor_profiler ──→ END (background)
├── interest_signals_detector ──→ END (background)
└── intent_classifier ──→ END (background)
After background nodes complete, action_router routes to the appropriate handler:
action_router ──┬── retrieval_awaiter ──→ legacy_answer_writer ──→ suggestions (answer path)
└── redirect_handler ──→ finalize (redirect path, cancels retrieval)
Race condition behavior:
action_routerwaits ONLY forintent_classifierandinterest_signals_detector(doesn't wait for retrieval)- Background nodes (
third_party_enricher,visitor_profiler,interest_signals_detector,intent_classifier) run in parallel - If background nodes complete before
action_routerstarts → their data IS available in state - If background nodes are still running →
action_routerproceeds WITHOUT waiting (uses default intent)
This means enrichment data (company name, sector, interest signals, intent) is available to the response handler on a best-effort basis. Any data not ready in time is persisted for the next conversation turn.
Streaming Architecture¶
The chatbot uses a two-phase streaming approach:
Phase 1: Token Streaming (response handler)¶
- Uses LangGraph's
astream_eventsAPI to captureon_chat_model_streamevents - Only streams from response handler nodes (
legacy_answer_writer,redirect_handler) - other nodes are filtered out to avoid streaming JSON - Tokens sent as Server-Sent Events:
{"type": "token", "content": "..."}
Phase 2: Completion Event (after finalize)¶
After streaming completes, the API layer:
- Waits for all graph nodes to complete (including background nodes)
- Fetches final state:
chatbot.graph.aget_state(config_dict) - Extracts from state:
suggested_follow_ups- Follow-up questions fromfollow_up_suggestersuggested_answers- Answer options fromanswer_suggestercta_url_overrides- Dynamic CTA URLs fromform_field_extractorvisitor_profile- Enriched company data
- Sends completion event:
{"type": "complete", "metadata": {...}}
Why This Design?¶
- Fast time-to-first-token: User sees response immediately from the response handler
- Best-effort enrichment: Background data used if ready, otherwise next turn
- Complete data at end: Suggestions and metadata require all nodes to finish
Code Flow¶
# chatbot.py - Streams only response handler tokens
async for event in self.graph.astream_events(...):
if event_type == "on_chat_model_stream":
node_name = metadata.get("langgraph_node", "")
if node_name not in ("legacy_answer_writer", "redirect_handler"): # Skip other nodes
continue
yield chunk.content # Stream token to client
# chat.py (API) - Fetches final state after streaming
state = await chatbot.graph.aget_state(config_dict)
suggested_follow_ups = state.values.get("suggested_follow_ups", [])
suggested_answers = state.values.get("suggested_answers", [])
# ... send completion event with metadata
Execution Flow¶
-
START: Five nodes launch in parallel
retrieval_task_starter- Fires retrieval task asynchronously (doesn't wait)third_party_enricher- Enriches visitor profile from IPvisitor_profiler- Infers company/sector from conversationinterest_signals_detector- Detects buying signalsintent_classifier- Classifies visitor intent using LLM (gpt-4.1-nano)
-
Action Routing: The
action_routerdecides based on intent + signals (doesn't wait for retrieval):- Support/Offtopic/Other intents (test/dev) →
redirect_handler(cancels retrieval task) - All other cases →
retrieval_awaiter→legacy_answer_writer
- Support/Offtopic/Other intents (test/dev) →
-
Answer Path: Sequential response generation with retrieval
retrieval_awaiter- Awaits deferred retrieval tasklegacy_answer_writer- Generates LLM response with RAG contextsuggestion_router- Routes to appropriate suggester
-
Redirect Path: Fast response without retrieval
redirect_handler- Cancels retrieval task, generates redirect response (gpt-4.1-nano)- Goes directly to
finalize(no suggestions needed)
-
Conditional Routing: After
legacy_answer_writer, thesuggestion_routerdecides:- Response contains
👉→answer_suggester(generate answer options) - Response contains
💌or URLs →skip_suggestions(go to finalize) - Default →
follow_up_suggester(generate follow-up questions)
- Response contains
-
Background Nodes: Run in parallel without blocking response
dialog_state_extractor- Extracts emoji markers and emailsform_field_extractor- Extracts form field values for CTA URLs
Graph Nodes¶
| Node | Type | Blocking | Purpose |
|---|---|---|---|
retrieval_task_starter |
async | No | Fires retrieval task asynchronously at START (deferred pattern) |
retrieval_awaiter |
async | Yes | Awaits deferred retrieval task on answer path only |
third_party_enricher |
async | No | Enriches visitor profile from IP address |
visitor_profiler |
async | No | Infers company/sector from conversation |
interest_signals_detector |
async | No | Detects buying signals (engagement, pricing interest) |
intent_classifier |
async | No | Classifies visitor intent using LLM (gpt-4.1-nano) |
action_router |
sync | N/A | Determines next action based on intent + signals |
legacy_answer_writer |
async | Yes | Generates LLM response with RAG context (formerly legacy_answer) |
redirect_handler |
async | Yes | Handles support/offtopic/other redirects, cancels retrieval (test/dev only, uses gpt-4.1-nano) |
suggestion_router |
sync | N/A | Routes to appropriate suggester based on response |
answer_suggester |
async | Yes | Generates suggested answers (when bot asks questions) |
follow_up_suggester |
async | Yes | Generates follow-up questions |
dialog_state_extractor |
async | No | Extracts emoji markers and captured emails |
form_field_extractor |
async | No | Extracts form field values for CTA URLs |
finalize |
sync | Yes | Assembles final response for client |
State Model¶
The graph uses RoseChatState (TypedDict) with custom reducers for parallel updates:
Core Fields¶
| Field | Type | Description |
|---|---|---|
messages |
list[BaseMessage] |
Conversation history |
input |
str |
Current user input |
response |
str |
Generated LLM response |
retrieved_docs |
str |
Context from LightRAG |
site_name |
str |
Client site identifier |
session_id |
str |
Conversation session ID |
turn_number |
int |
Current conversation turn (0-indexed) |
Profile & Signals¶
| Field | Type | Reducer |
|---|---|---|
visitor_profile |
VisitorProfile |
merge_visitor_profiles |
dialog_supervision_state |
DialogSupervisionState |
merge_dialog_supervision_states |
interest_signals_state |
InterestSignalsState |
merge_interest_signals_states |
form_collection_state |
FormCollectionState |
merge_form_collection_states |
Intent & Action Router State¶
| Field | Type | Description |
|---|---|---|
intent_classification_state |
IntentClassificationState |
Current intent + history |
action_router_state |
ActionRouterState |
Next action + reasoning |
VisitorIntent enum values: LEARN, CONTEXT, SUPPORT, OFFTOPIC, OTHER
NextAction enum values: EDUCATE, QUALIFY, PROPOSE_DEMO, HANDLE_SUPPORT, HANDLE_OFFTOPIC, HANDLE_OTHER, CONTINUE
Custom Reducers¶
Parallel nodes update state using custom merge functions:
merge_visitor_profiles: Merges enrichment results, preferring non-"unknown" valuesmerge_dialog_supervision_states: Cumulative "ever" flags + latest turn flagsmerge_interest_signals_states: Simple replacementmerge_form_collection_states: Merges collected values, tracks max turn
Memory Management¶
Session state is persisted using LangGraph checkpointers:
| Mode | Backend | Use Case |
|---|---|---|
| Redis | AsyncRedisSaver |
Production (distributed) |
| Memory | MemorySaver |
Development/Testing |
Configuration:
- TTL: Configurable session timeout
- Keepalive: TCP socket keepalive enabled
- Health checks: 30-second pings prevent idle disconnection
# Memory manager initialization
memory_manager = IXChatMemoryManager()
checkpointer = await memory_manager.get_checkpointer()
graph = graph_builder.compile(checkpointer=checkpointer)
Enrichment System¶
Multi-source visitor enrichment pipeline with priority-based fallbacks:
| Priority | Source | Description |
|---|---|---|
| 1 | Redis Cache | Fast, short-lived cache |
| 2 | Supabase Lookup | IP hash lookup for returning visitors |
| 3 | Browser Reveal | Client-side data (window.reveal) |
| 4 | Snitcher Radar | Session UUID identification |
| 5 | Enrich.so | Server-side API fallback |
Once a source returns "completed" status, remaining sources are skipped.
VisitorProfile Fields¶
- Enrichment:
status,tier,source,ip_address - Company:
company_name,company_description,company_domain,sector,sub_sector - User Context:
email,job_to_be_done,feature_list,intent - Confidence:
sector_confidence_level
Integration Points¶
| System | Purpose | Package |
|---|---|---|
| LightRAG | Document retrieval with graph & chunk ranking | ixrag |
| Supabase | Conversation storage, client configs, lead data | ixdata |
| LangFuse | Observability & tracing | ixllm |
| Azure OpenAI | LLM client | ixllm |
| Redis | Session checkpointing | ixchat.memory |
Key Files¶
| File | Description |
|---|---|
ixchat/__init__.py |
Public API: get_chatbot_service() |
ixchat/service.py |
IXChatbotService singleton manager |
ixchat/chatbot.py |
IXChatbot with LangGraph orchestration |
ixchat/graph_structure.py |
Graph structure with node/edge definitions |
ixchat/memory.py |
IXChatMemoryManager for session persistence |
ixchat/nodes/ |
Node implementations |
ixchat/nodes/intent_classifier.py |
Intent classification using LLM |
ixchat/nodes/action_router.py |
Deterministic action routing |
ixchat/nodes/redirect_handler.py |
Redirect agent for support/offtopic/other |
ixchat/nodes/retrieval_task_starter.py |
Fires retrieval task asynchronously |
ixchat/nodes/retrieval_awaiter.py |
Awaits deferred retrieval task |
ixchat/retrieval_task_store.py |
Manages retrieval tasks for cancellation |
ixchat/pydantic_models/ |
State definitions and reducers |
ixchat/pydantic_models/intent_router.py |
Intent/action router state models |
ixchat/utils/agent_config.py |
AgentConfigResolver for site-specific config |
ixchat/enrichment/ |
Multi-source visitor enrichment |
Usage¶
from ixchat import get_chatbot_service
# Get singleton service
service = get_chatbot_service()
# Get chatbot for a site
chatbot = await service.get_chatbot("example-site")
# Query with streaming
async for chunk in chatbot.query_stream(
input="Tell me about your product",
site_name="example-site",
session_id="session-123",
person_id="posthog-distinct-id",
):
print(chunk, end="")
# Non-streaming query (for evaluations)
response, metadata = await chatbot.query(
input="What are your pricing plans?",
site_name="example-site",
session_id="session-123",
)
Evaluations¶
The just eval command runs LLM evaluation tests for quality assessment and regression testing of ixchat components using Langfuse datasets.
How It Works¶
Langfuse Dataset ──→ Evaluator ──→ Classifier (LLM) ──→ Results logged to Langfuse
(labeled examples) (real API calls) (runs + scores)
- Test data is fetched from Langfuse datasets (labeled input/expected_output pairs)
- Evaluator runs the classifier on each dataset item
- Results are logged back to Langfuse as runs with scores (correct, confidence, F1, etc.)
- Metrics are computed (accuracy, F1, precision, recall) and asserted against thresholds
Usage¶
cd backend
# Run a specific evaluation
just eval intent-classifier # Intent classification accuracy
just eval skill-selector # Skill selection accuracy
just eval e2e-api # End-to-end API evaluation
# Run all evaluations
just eval all
Available Targets¶
| Target | Langfuse Dataset | Description |
|---|---|---|
intent-classifier |
intent-classifier |
Tests intent classification (LEARN, CONTEXT, SUPPORT, OFFTOPIC, OTHER) |
skill-selector |
skill-selector |
Tests skill/action routing decisions |
e2e-api |
main-dataset |
End-to-end API response quality |
Langfuse Dataset Structure¶
Each dataset item in Langfuse should have:
| Field | Description | Example |
|---|---|---|
input |
Classifier input (dict or string) | {"message": "How does pricing work?", "history": [...]} |
expected_output |
Expected classification result | {"intent": "LEARN"} |
metadata |
Optional context | {"source": "production", "site_name": "example"} |
Adding Traces to Datasets¶
To expand test coverage, add production traces to Langfuse datasets:
Option 1: Langfuse UI
- Go to Traces in Langfuse
- Find a trace with interesting/edge-case behavior
- Click Add to Dataset → select target dataset
- Fill in the
expected_output(ground truth label)
Option 2: Langfuse API
from langfuse import Langfuse
langfuse = Langfuse()
# Add item to existing dataset
langfuse.create_dataset_item(
dataset_name="intent-classifier",
input={"message": "Can you help me debug?", "history": []},
expected_output={"intent": "SUPPORT"},
metadata={"source": "manual", "notes": "Edge case for support detection"}
)
Environment Configuration¶
The eval command automatically configures:
IX_LANGFUSE_ENABLED=true- Enables Langfuse for real prompt fetchingLANGFUSE_ENABLED=true- Langfuse integration flagIX_ENVIRONMENT=test- Uses test environment (overridden todevelopmentfor credentials)
Test Markers¶
@pytest.mark.evaluation # Marks as evaluation test
@pytest.mark.llm_integration # Requires real LLM API calls
Running just eval all filters: -m "evaluation and llm_integration"
Results in Langfuse¶
After running evaluations, results appear in Langfuse:
| Score | Description |
|---|---|
correct |
Per-item: 1.0 if prediction matches expected, 0.0 otherwise |
confidence |
Per-item: Model confidence score (if available) |
macro_f1 |
Aggregate: Macro-averaged F1 score across all classes |
weighted_f1 |
Aggregate: Weighted F1 score |
accuracy |
Aggregate: Overall accuracy |
passed |
Aggregate: 1.0 if F1 >= threshold, 0.0 otherwise |
Quality Thresholds¶
Default thresholds (configurable in conftest.py):
| Metric | Threshold | Description |
|---|---|---|
| Macro F1 | 0.80 | Overall classification quality |
| Min Class F1 | 0.60 | No single class below this |
| Skill Recall | 0.90 | Multi-label skill coverage |
| Answer Accuracy | 0.70 | E2E response quality |