Skip to content

IXChat Package

LangGraph-based chatbot with retrieval-augmented generation (RAG), conversation memory, and intelligent visitor enrichment.

Overview

The ixchat package provides the core chatbot functionality for the Rose platform. It uses LangGraph to orchestrate a complex workflow of specialized nodes that handle:

  • Document Retrieval: Fetches relevant context from LightRAG
  • Visitor Enrichment: Identifies companies from IP addresses
  • Response Generation: Produces contextual answers with LLM
  • Suggestion Generation: Creates follow-up questions or answer options
  • Dialog Supervision: Tracks conversation state and signals

Architecture Diagram

The following diagram shows the LangGraph structure. It is auto-generated from the graph definition during just build or just dev.

--- config: flowchart: curve: linear --- graph TD; __start__([<p>__start__</p>]):::first retrieval_task_starter(retrieval_task_starter) retrieval_awaiter(retrieval_awaiter) third_party_enricher(third_party_enricher) visitor_profiler(visitor_profiler) interest_signals_detector(interest_signals_detector) intent_classifier(intent_classifier) action_router(action_router) skill_selector(skill_selector) dialog_state_extractor(dialog_state_extractor) form_field_extractor(form_field_extractor) answer_writer(answer_writer) legacy_answer_writer(legacy_answer_writer) follow_up_suggester(follow_up_suggester) answer_suggester(answer_suggester) finalize(finalize) redirect_handler(redirect_handler) booking_handler(booking_handler) __end__([<p>__end__</p>]):::last __start__ --> intent_classifier; __start__ --> interest_signals_detector; __start__ --> retrieval_task_starter; __start__ --> third_party_enricher; __start__ --> visitor_profiler; action_router -. &nbsp;booking&nbsp; .-> booking_handler; action_router -. &nbsp;redirect&nbsp; .-> redirect_handler; action_router -. &nbsp;answer&nbsp; .-> retrieval_awaiter; answer_suggester --> finalize; answer_writer -.-> answer_suggester; answer_writer --> dialog_state_extractor; answer_writer -. &nbsp;skip_suggestions&nbsp; .-> finalize; answer_writer -.-> follow_up_suggester; answer_writer --> form_field_extractor; booking_handler --> finalize; follow_up_suggester --> finalize; form_field_extractor --> finalize; intent_classifier --> skill_selector; interest_signals_detector --> skill_selector; legacy_answer_writer -.-> answer_suggester; legacy_answer_writer --> dialog_state_extractor; legacy_answer_writer -. &nbsp;skip_suggestions&nbsp; .-> finalize; legacy_answer_writer -.-> follow_up_suggester; legacy_answer_writer --> form_field_extractor; redirect_handler --> finalize; retrieval_awaiter -. &nbsp;new_writer&nbsp; .-> answer_writer; retrieval_awaiter -. &nbsp;legacy_writer&nbsp; .-> legacy_answer_writer; skill_selector -. &nbsp;new_system&nbsp; .-> action_router; skill_selector -. &nbsp;legacy_system&nbsp; .-> retrieval_awaiter; dialog_state_extractor --> __end__; finalize --> __end__; retrieval_task_starter --> __end__; third_party_enricher --> __end__; visitor_profiler --> __end__; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc

Multi-Agent Router Architecture (WIP)

Partial Implementation

This architecture is partially implemented. Only the redirect handler agent is active, and only in test/development environments. Production uses the legacy legacy_answer_writer with monolithic prompts.

See ADR: Prompt Modularization for the full design.

Overview

The multi-agent router architecture replaces the monolithic prompt approach with specialized agents for different visitor intents. Instead of one large prompt handling all scenarios, the system:

  1. Classifies intent using a fast LLM (gpt-4.1-nano)
  2. Routes to specialized agents based on intent + interest signals
  3. Uses 3-level prompt hierarchy for each agent (meta-template → agent template → client instructions)

Intent Classification

The intent_classifier node classifies each message into one of 5 visitor intents:

Intent Description Example
LEARN Product questions, feature inquiries "How does your A/B testing work?"
CONTEXT User sharing business context "We have 50k monthly visitors"
SUPPORT Existing customer issues "I can't log into my dashboard"
OFFTOPIC Unrelated to product "What's the weather today?"
OTHER Job inquiries, press, partnerships "Are you hiring?"

The classifier runs in parallel with other background nodes from START, using the Langfuse prompt rose-internal-intent-router.

Action Routing

The action_router node uses deterministic logic (no LLM) to decide the next action based on:

  • Current visitor intent
  • Cumulative interest score (from interest_signals_detector)
  • Site-specific interest threshold (from agent_config table)
Action Trigger Handler
EDUCATE LEARN intent legacy_answer (planned: educator agent)
QUALIFY CONTEXT intent legacy_answer (planned: qualifier agent)
PROPOSE_DEMO Qualified + buying signals legacy_answer (planned: CTA agent)
HANDLE_SUPPORT SUPPORT intent redirect_handler
HANDLE_OFFTOPIC OFFTOPIC intent redirect_handler
HANDLE_OTHER OTHER intent redirect_handler

Redirect Handler

The redirect_handler is the only specialized agent currently implemented. It handles support, off-topic, and other requests by redirecting users to appropriate resources.

3-Level Prompt Hierarchy:

rose-internal/response-agents/meta-template (Level 1 - shared)
  └── {{lf_agent_instructions}} ← Agent template inserted
        └── rose-internal/response-agents/redirect/template (Level 2)
              └── {{lf_client_agent_instructions}} ← Client instructions inserted
                    └── rose-internal/response-agents/redirect/instructions/{domain} (Level 3)

Key features:

  • Skips RAG retrieval: The redirect handler cancels the retrieval task for faster responses since it doesn't need knowledge base content. This saves 10-30% retrieval costs on redirect cases.
  • Uses gpt-4.1-nano: Optimized for fast, lightweight redirect responses.

Environment-Based Routing

Environment Support/Offtopic/Other Educate/Qualify/Demo
Production legacy_answer legacy_answer
Test/Development redirect_handler legacy_answer

Current Limitations

  • Educator agent: Not implemented (routes to legacy_answer)
  • Qualifier agent: Not implemented (routes to legacy_answer)
  • CTA/Demo agent: Not implemented (routes to legacy_answer)
  • A/B testing: No framework for comparing router vs monolithic performance

Deferred Retrieval Pattern

The graph uses a deferred retrieval pattern to optimize performance and reduce costs:

  1. retrieval_task_starter fires the retrieval task at START without waiting (fire-and-forget)
  2. action_router decides the path based on intent + signals (doesn't need retrieval results)
  3. Answer path: retrieval_awaiter awaits the deferred task before legacy_answer_writer
  4. Redirect path: redirect_handler cancels the retrieval task (saves 10-30% retrieval costs)

This pattern ensures retrieval only happens when needed (answer path), avoiding wasted work on redirects.

Parallel Execution and Race Conditions

Five nodes start in parallel from START:

START ──┬── retrieval_task_starter ──→ END (fires async task, doesn't wait)
        ├── third_party_enricher ──→ END (background)
        ├── visitor_profiler ──→ END (background)
        ├── interest_signals_detector ──→ END (background)
        └── intent_classifier ──→ END (background)

After background nodes complete, action_router routes to the appropriate handler:

action_router ──┬── retrieval_awaiter ──→ legacy_answer_writer ──→ suggestions (answer path)
                └── redirect_handler ──→ finalize (redirect path, cancels retrieval)

Race condition behavior:

  • action_router waits ONLY for intent_classifier and interest_signals_detector (doesn't wait for retrieval)
  • Background nodes (third_party_enricher, visitor_profiler, interest_signals_detector, intent_classifier) run in parallel
  • If background nodes complete before action_router starts → their data IS available in state
  • If background nodes are still runningaction_router proceeds WITHOUT waiting (uses default intent)

This means enrichment data (company name, sector, interest signals, intent) is available to the response handler on a best-effort basis. Any data not ready in time is persisted for the next conversation turn.

Streaming Architecture

The chatbot uses a two-phase streaming approach:

Phase 1: Token Streaming (response handler)

Client ← token ← token ← token ← ... ← legacy_answer_writer or redirect_handler node
  • Uses LangGraph's astream_events API to capture on_chat_model_stream events
  • Only streams from response handler nodes (legacy_answer_writer, redirect_handler) - other nodes are filtered out to avoid streaming JSON
  • Tokens sent as Server-Sent Events: {"type": "token", "content": "..."}

Phase 2: Completion Event (after finalize)

Client ← complete event ← API fetches final state from graph

After streaming completes, the API layer:

  1. Waits for all graph nodes to complete (including background nodes)
  2. Fetches final state: chatbot.graph.aget_state(config_dict)
  3. Extracts from state:
    • suggested_follow_ups - Follow-up questions from follow_up_suggester
    • suggested_answers - Answer options from answer_suggester
    • cta_url_overrides - Dynamic CTA URLs from form_field_extractor
    • visitor_profile - Enriched company data
  4. Sends completion event: {"type": "complete", "metadata": {...}}

Why This Design?

  • Fast time-to-first-token: User sees response immediately from the response handler
  • Best-effort enrichment: Background data used if ready, otherwise next turn
  • Complete data at end: Suggestions and metadata require all nodes to finish

Code Flow

# chatbot.py - Streams only response handler tokens
async for event in self.graph.astream_events(...):
    if event_type == "on_chat_model_stream":
        node_name = metadata.get("langgraph_node", "")
        if node_name not in ("legacy_answer_writer", "redirect_handler"):  # Skip other nodes
            continue
        yield chunk.content  # Stream token to client

# chat.py (API) - Fetches final state after streaming
state = await chatbot.graph.aget_state(config_dict)
suggested_follow_ups = state.values.get("suggested_follow_ups", [])
suggested_answers = state.values.get("suggested_answers", [])
# ... send completion event with metadata

Execution Flow

  1. START: Five nodes launch in parallel

    • retrieval_task_starter - Fires retrieval task asynchronously (doesn't wait)
    • third_party_enricher - Enriches visitor profile from IP
    • visitor_profiler - Infers company/sector from conversation
    • interest_signals_detector - Detects buying signals
    • intent_classifier - Classifies visitor intent using LLM (gpt-4.1-nano)
  2. Action Routing: The action_router decides based on intent + signals (doesn't wait for retrieval):

    • Support/Offtopic/Other intents (test/dev) → redirect_handler (cancels retrieval task)
    • All other cases → retrieval_awaiterlegacy_answer_writer
  3. Answer Path: Sequential response generation with retrieval

    • retrieval_awaiter - Awaits deferred retrieval task
    • legacy_answer_writer - Generates LLM response with RAG context
    • suggestion_router - Routes to appropriate suggester
  4. Redirect Path: Fast response without retrieval

    • redirect_handler - Cancels retrieval task, generates redirect response (gpt-4.1-nano)
    • Goes directly to finalize (no suggestions needed)
  5. Conditional Routing: After legacy_answer_writer, the suggestion_router decides:

    • Response contains 👉answer_suggester (generate answer options)
    • Response contains 💌 or URLs → skip_suggestions (go to finalize)
    • Default → follow_up_suggester (generate follow-up questions)
  6. Background Nodes: Run in parallel without blocking response

    • dialog_state_extractor - Extracts emoji markers and emails
    • form_field_extractor - Extracts form field values for CTA URLs

Graph Nodes

Node Type Blocking Purpose
retrieval_task_starter async No Fires retrieval task asynchronously at START (deferred pattern)
retrieval_awaiter async Yes Awaits deferred retrieval task on answer path only
third_party_enricher async No Enriches visitor profile from IP address
visitor_profiler async No Infers company/sector from conversation
interest_signals_detector async No Detects buying signals (engagement, pricing interest)
intent_classifier async No Classifies visitor intent using LLM (gpt-4.1-nano)
action_router sync N/A Determines next action based on intent + signals
legacy_answer_writer async Yes Generates LLM response with RAG context (formerly legacy_answer)
redirect_handler async Yes Handles support/offtopic/other redirects, cancels retrieval (test/dev only, uses gpt-4.1-nano)
suggestion_router sync N/A Routes to appropriate suggester based on response
answer_suggester async Yes Generates suggested answers (when bot asks questions)
follow_up_suggester async Yes Generates follow-up questions
dialog_state_extractor async No Extracts emoji markers and captured emails
form_field_extractor async No Extracts form field values for CTA URLs
finalize sync Yes Assembles final response for client

State Model

The graph uses RoseChatState (TypedDict) with custom reducers for parallel updates:

Core Fields

Field Type Description
messages list[BaseMessage] Conversation history
input str Current user input
response str Generated LLM response
retrieved_docs str Context from LightRAG
site_name str Client site identifier
session_id str Conversation session ID
turn_number int Current conversation turn (0-indexed)

Profile & Signals

Field Type Reducer
visitor_profile VisitorProfile merge_visitor_profiles
dialog_supervision_state DialogSupervisionState merge_dialog_supervision_states
interest_signals_state InterestSignalsState merge_interest_signals_states
form_collection_state FormCollectionState merge_form_collection_states

Intent & Action Router State

Field Type Description
intent_classification_state IntentClassificationState Current intent + history
action_router_state ActionRouterState Next action + reasoning

VisitorIntent enum values: LEARN, CONTEXT, SUPPORT, OFFTOPIC, OTHER

NextAction enum values: EDUCATE, QUALIFY, PROPOSE_DEMO, HANDLE_SUPPORT, HANDLE_OFFTOPIC, HANDLE_OTHER, CONTINUE

Custom Reducers

Parallel nodes update state using custom merge functions:

  • merge_visitor_profiles: Merges enrichment results, preferring non-"unknown" values
  • merge_dialog_supervision_states: Cumulative "ever" flags + latest turn flags
  • merge_interest_signals_states: Simple replacement
  • merge_form_collection_states: Merges collected values, tracks max turn

Memory Management

Session state is persisted using LangGraph checkpointers:

Mode Backend Use Case
Redis AsyncRedisSaver Production (distributed)
Memory MemorySaver Development/Testing

Configuration:

  • TTL: Configurable session timeout
  • Keepalive: TCP socket keepalive enabled
  • Health checks: 30-second pings prevent idle disconnection
# Memory manager initialization
memory_manager = IXChatMemoryManager()
checkpointer = await memory_manager.get_checkpointer()
graph = graph_builder.compile(checkpointer=checkpointer)

Enrichment System

Multi-source visitor enrichment pipeline with priority-based fallbacks:

Priority Source Description
1 Redis Cache Fast, short-lived cache
2 Supabase Lookup IP hash lookup for returning visitors
3 Browser Reveal Client-side data (window.reveal)
4 Snitcher Radar Session UUID identification
5 Enrich.so Server-side API fallback

Once a source returns "completed" status, remaining sources are skipped.

VisitorProfile Fields

  • Enrichment: status, tier, source, ip_address
  • Company: company_name, company_description, company_domain, sector, sub_sector
  • User Context: email, job_to_be_done, feature_list, intent
  • Confidence: sector_confidence_level

Integration Points

System Purpose Package
LightRAG Document retrieval with graph & chunk ranking ixrag
Supabase Conversation storage, client configs, lead data ixdata
LangFuse Observability & tracing ixllm
Azure OpenAI LLM client ixllm
Redis Session checkpointing ixchat.memory

Key Files

File Description
ixchat/__init__.py Public API: get_chatbot_service()
ixchat/service.py IXChatbotService singleton manager
ixchat/chatbot.py IXChatbot with LangGraph orchestration
ixchat/graph_structure.py Graph structure with node/edge definitions
ixchat/memory.py IXChatMemoryManager for session persistence
ixchat/nodes/ Node implementations
ixchat/nodes/intent_classifier.py Intent classification using LLM
ixchat/nodes/action_router.py Deterministic action routing
ixchat/nodes/redirect_handler.py Redirect agent for support/offtopic/other
ixchat/nodes/retrieval_task_starter.py Fires retrieval task asynchronously
ixchat/nodes/retrieval_awaiter.py Awaits deferred retrieval task
ixchat/retrieval_task_store.py Manages retrieval tasks for cancellation
ixchat/pydantic_models/ State definitions and reducers
ixchat/pydantic_models/intent_router.py Intent/action router state models
ixchat/utils/agent_config.py AgentConfigResolver for site-specific config
ixchat/enrichment/ Multi-source visitor enrichment

Usage

from ixchat import get_chatbot_service

# Get singleton service
service = get_chatbot_service()

# Get chatbot for a site
chatbot = await service.get_chatbot("example-site")

# Query with streaming
async for chunk in chatbot.query_stream(
    input="Tell me about your product",
    site_name="example-site",
    session_id="session-123",
    person_id="posthog-distinct-id",
):
    print(chunk, end="")

# Non-streaming query (for evaluations)
response, metadata = await chatbot.query(
    input="What are your pricing plans?",
    site_name="example-site",
    session_id="session-123",
)

Evaluations

The just eval command runs LLM evaluation tests for quality assessment and regression testing of ixchat components using Langfuse datasets.

How It Works

Langfuse Dataset ──→ Evaluator ──→ Classifier (LLM) ──→ Results logged to Langfuse
   (labeled examples)              (real API calls)        (runs + scores)
  1. Test data is fetched from Langfuse datasets (labeled input/expected_output pairs)
  2. Evaluator runs the classifier on each dataset item
  3. Results are logged back to Langfuse as runs with scores (correct, confidence, F1, etc.)
  4. Metrics are computed (accuracy, F1, precision, recall) and asserted against thresholds

Usage

cd backend

# Run a specific evaluation
just eval intent-classifier    # Intent classification accuracy
just eval skill-selector       # Skill selection accuracy
just eval e2e-api              # End-to-end API evaluation

# Run all evaluations
just eval all

Available Targets

Target Langfuse Dataset Description
intent-classifier intent-classifier Tests intent classification (LEARN, CONTEXT, SUPPORT, OFFTOPIC, OTHER)
skill-selector skill-selector Tests skill/action routing decisions
e2e-api main-dataset End-to-end API response quality

Langfuse Dataset Structure

Each dataset item in Langfuse should have:

Field Description Example
input Classifier input (dict or string) {"message": "How does pricing work?", "history": [...]}
expected_output Expected classification result {"intent": "LEARN"}
metadata Optional context {"source": "production", "site_name": "example"}

Adding Traces to Datasets

To expand test coverage, add production traces to Langfuse datasets:

Option 1: Langfuse UI

  1. Go to Traces in Langfuse
  2. Find a trace with interesting/edge-case behavior
  3. Click Add to Dataset → select target dataset
  4. Fill in the expected_output (ground truth label)

Option 2: Langfuse API

from langfuse import Langfuse

langfuse = Langfuse()

# Add item to existing dataset
langfuse.create_dataset_item(
    dataset_name="intent-classifier",
    input={"message": "Can you help me debug?", "history": []},
    expected_output={"intent": "SUPPORT"},
    metadata={"source": "manual", "notes": "Edge case for support detection"}
)

Environment Configuration

The eval command automatically configures:

  • IX_LANGFUSE_ENABLED=true - Enables Langfuse for real prompt fetching
  • LANGFUSE_ENABLED=true - Langfuse integration flag
  • IX_ENVIRONMENT=test - Uses test environment (overridden to development for credentials)

Test Markers

@pytest.mark.evaluation      # Marks as evaluation test
@pytest.mark.llm_integration # Requires real LLM API calls

Running just eval all filters: -m "evaluation and llm_integration"

Results in Langfuse

After running evaluations, results appear in Langfuse:

Score Description
correct Per-item: 1.0 if prediction matches expected, 0.0 otherwise
confidence Per-item: Model confidence score (if available)
macro_f1 Aggregate: Macro-averaged F1 score across all classes
weighted_f1 Aggregate: Weighted F1 score
accuracy Aggregate: Overall accuracy
passed Aggregate: 1.0 if F1 >= threshold, 0.0 otherwise

Quality Thresholds

Default thresholds (configurable in conftest.py):

Metric Threshold Description
Macro F1 0.80 Overall classification quality
Min Class F1 0.60 No single class below this
Skill Recall 0.90 Multi-label skill coverage
Answer Accuracy 0.70 E2E response quality