Skip to content

December 2025 - Intent Router POC & Multi-Agent Architecture

Context

With the core agentic platform stable in production, December's R&D efforts focused on designing a next-generation multi-agent architecture. The goal was to move from a monolithic conversational agent to an intent-based routing system capable of directing conversations to specialized handlers.

Active clients during this period included AB Tasty, Pennylane, Skaleet, Skello, and the Rose website.

Key Business Metrics Tracked:

  1. Initial Engagement Rate: Percentage of website visitors who initiate an interaction with the widget
  2. Conversation Depth: Average number of message exchanges per conversation
  3. Conversion Rate: Percentage of conversations that result in a demo request or email capture

Technical Challenge

The existing architecture presented several fundamental limitations that could not be addressed through incremental improvements:

Single-agent limitation: The system used a single conversational agent with one unified prompt to handle all types of user interactions. This meant technical questions, sales inquiries, and support requests all received the same treatment, regardless of their distinct characteristics and optimal handling strategies.

Prompt sprawl: As the client base grew, each client required a customized prompt. Without a modular architecture, prompts became increasingly difficult to maintain, with shared behaviors duplicated across all configurations.

State serialization fragility: The LangGraph framework (an orchestration library built on LangChain for building stateful multi-agent applications) stores conversation state through a checkpointing mechanism. Changes to the serialization format between versions caused failures when deserializing nested data structures.

Business rationale: Analysis of conversion patterns revealed that technical deep-dive conversations convert at approximately [TO BE MEASURED] times the rate of general inquiries. This suggested that specialized handling based on detected intent could significantly improve business outcomes.


Hypothesis 1: Intent Router for Multi-Agent Orchestration

"Classifying user intent and routing to specialized agents will improve conversion rates by providing more relevant responses."

Technical Uncertainty

The existing architecture relied on a single, monolithic prompt per client. Each client's prompt contained approximately 2000+ lines, combining client-specific content with shared behaviors such as guardrails, tone guidelines, and safety constraints. This architecture created several technical challenges:

Scalability problem: Every client required a complete, self-contained prompt. Core behaviors were duplicated across all client configurations, creating O(n) maintenance costs where n represents the number of clients. Any improvement to shared logic required manual propagation to each client's prompt.

Configuration drift risk: Without a single source of truth for shared behaviors, client configurations could diverge over time. One client might receive updated guardrails while another retained outdated versions, leading to inconsistent behavior across the platform.

LLM instruction-following limitations: Large language models demonstrate reduced instruction-following accuracy when presented with long, complex prompts containing multiple conditional branches. The monolithic approach forced all decision logic into a single prompt, potentially degrading response quality.

Token inefficiency: The complete prompt was loaded for every request regardless of the actual complexity of the user's query. A simple greeting consumed the same computational resources as a complex technical inquiry.

Scientific Question: Can we decompose the monolithic prompt architecture into an intent-based routing system where shared behaviors are defined once and injected consistently, specialized handlers address specific conversation types, and computational resources scale with actual conversation complexity?

State of the Art Gap: Existing conversational AI systems typically employ either monolithic prompts (simple to implement but difficult to scale) or heavyweight orchestration frameworks (powerful but operationally complex). No established patterns existed for lightweight intent-based routing that balances implementation simplicity with behavioral specialization.

Experimental Methodology

The experiment involved designing an intent classification system capable of categorizing visitor messages into distinct intent categories: general product questions, technical inquiries, sales-related conversations, support requests, demo coordination, and pricing discussions.

The classification system was integrated into the LangGraph execution pipeline as a parallel node. LangGraph organizes processing into "supersteps" where nodes within the same superstep execute concurrently. By placing the intent classifier alongside other analysis nodes (knowledge retrieval, visitor profiling, interest signal detection), we could perform intent classification without adding to the critical path latency.

The classified intent was then passed to a routing component that would determine the appropriate handling strategy. For this proof-of-concept, all intents were routed to the standard answer generation, but the architecture was designed to support specialized handlers in future iterations.

Testing & Validation

Method: The plan was to conduct a production A/B test, routing 50% of traffic through the intent-enabled pipeline and measuring key business metrics.

Results

Note: The following metrics represent preliminary observations from the POC phase. Full A/B testing with statistical significance analysis is planned for Q1 2026.

Metric Without Intent Router With Intent Router (POC)
Initial Engagement [PENDING A/B TEST] [PENDING A/B TEST]
Avg Conversation Turns [PENDING A/B TEST] [PENDING A/B TEST]
Conversion Rate [PENDING A/B TEST] [PENDING A/B TEST]

Intent Classification Accuracy (based on manual review of sample conversations):

Intent Type Accuracy
Technical Questions [TO BE MEASURED]
Sales Inquiries [TO BE MEASURED]
Support Requests [TO BE MEASURED]
Demo Requests [TO BE MEASURED]

Conclusion

PARTIAL SUCCESS (POC): The intent classification architecture was successfully implemented and integrated into the execution graph. The system correctly classifies intents in observed conversations, though formal accuracy measurement requires a labeled evaluation dataset. Full agent specialization with differentiated handling strategies is planned for Q1 2026.


Hypothesis 2: Agent Configuration Resolver

"A cascading configuration system will enable per-client prompt customization without code changes."

Technical Uncertainty

The multi-tenant nature of the platform required different behavioral configurations for different clients and industries. However, implementing these variations through code changes created deployment overhead and increased the risk of regressions.

Scientific Question: Can we design a configuration resolution system that allows runtime customization of agent behavior while maintaining a clear hierarchy of defaults, global settings, and site-specific overrides?

Experimental Methodology

The solution involved implementing a cascading configuration resolver that follows a priority hierarchy: site-specific configurations take precedence over global configurations, which in turn override default values. This pattern allows operators to customize behavior at any level without modifying application code.

The resolver handles various configuration types including interest signal definitions (which buying signals to detect and how to weight them), client descriptions (contextual information about the client's business for prompt injection), and behavioral parameters (thresholds, timing rules, and response strategies).

Different industries require different signal definitions. For example, a SaaS company might consider integration-related questions as strong buying signals, while an accounting software provider might weight compliance questions differently. The configuration resolver allows these industry-specific rules to be defined declaratively.

Testing & Validation

Method: Cross-client comparison using identical base prompts with different configuration overlays. Business metrics were tracked per client to validate that the configuration system allowed meaningful behavioral differentiation.

Results

Client Industry Engagement Turns Conversion
AB Tasty SaaS [TO BE MEASURED] [TO BE MEASURED] [TO BE MEASURED]
Pennylane Accounting [TO BE MEASURED] [TO BE MEASURED] [TO BE MEASURED]
Skello HR [TO BE MEASURED] [TO BE MEASURED] [TO BE MEASURED]

Conclusion

SUCCESS: The configuration resolver architecture was successfully implemented, enabling industry-specific behavioral customization without code changes. The system correctly loads and applies hierarchical configurations at runtime.


Hypothesis 3: Robust State Serialization for Nested Pydantic Models

"Custom deserialization handlers will prevent state corruption when LangGraph checkpointing format changes."

Technical Uncertainty

LangGraph maintains conversation state through a checkpointing mechanism that serializes the current state to persistent storage (in our case, MongoDB) and deserializes it when resuming a conversation. This enables stateful multi-turn conversations across multiple requests.

The challenge arose when LangGraph's internal serialization format changed between versions. The state object contains nested Pydantic models (Python data validation classes) representing visitor profiles, dialog states, detected interest signals, and intent classifications. When the checkpoint format changed, the deserialization process failed to correctly reconstruct these nested structures, causing state corruption or application crashes.

Scientific Question: Can we implement a deserialization layer that abstracts away LangGraph's internal format changes while correctly handling complex nested data structures?

Experimental Methodology

The solution involved implementing a custom deserialization handler that performs two key functions:

Format detection: The handler first inspects the raw checkpoint data to determine which serialization format was used. Different LangGraph versions structure the data differently (for example, earlier versions stored state directly while later versions wrap it in a "channel_values" container). The handler detects the format and extracts the state values accordingly.

Type-aware reconstruction: For each field containing a nested Pydantic model, the handler explicitly validates and reconstructs the model from the raw dictionary data. This ensures that even if the serialization changed how dictionaries are stored, the resulting objects are properly typed Python instances. Special handling was implemented for list fields, where each element must be individually validated and reconstructed.

Testing & Validation

Method: Regression tests were created using checkpoint fixtures captured from both old and new LangGraph versions. The test suite validated that deserialization succeeds across format versions and during migration scenarios where some checkpoints use the old format and others use the new format.

Results

Scenario Before After
Old format checkpoints Application crash Successful deserialization
New format checkpoints Working Working
Mixed format (migration) Application crash Successful deserialization
Nested model preservation Data structure lost Data structure preserved

Conclusion

SUCCESS: The custom deserialization layer provides backward compatibility during LangGraph version upgrades. Existing conversation checkpoints remain valid after framework updates, eliminating the need for data migration.


Hypothesis 4: Graph Architecture Optimization for Latency

"Parallel execution from START with convergence at action_router will minimize response latency while preserving complete context for routing decisions."

Technical Uncertainty

LangGraph employs a "superstep" execution model inspired by Google's Pregel framework for distributed graph processing. Understanding this model is essential to the optimization challenge:

What is the superstep model? In Pregel-style graph processing, computation proceeds in discrete phases called supersteps. Within a superstep, all active nodes execute their computation concurrently. At the end of each superstep, a synchronization barrier ensures all nodes complete before the next superstep begins. State updates are applied atomically at these boundaries.

The optimization challenge: This model creates a fundamental tradeoff. Placing nodes in the same superstep allows parallel execution, reducing latency from the sum of execution times to the maximum. However, the synchronization barrier means the slowest node determines the superstep duration. Additionally, nodes that require outputs from other nodes must be placed in subsequent supersteps.

Scientific Question: How should we organize graph nodes to minimize end-to-end latency while ensuring the routing component has access to all required contextual information (intent classification, interest signals, retrieved knowledge)?

State of the Art Gap: LangGraph documentation primarily addresses simple linear or branching graph topologies. No established patterns existed for multi-agent architectures requiring parallel analysis with convergence at a routing decision point.

Experimental Methodology

The research proceeded through architectural analysis and iterative refinement:

Initial sequential architecture: The original design processed nodes sequentially. Knowledge retrieval completed before intent routing, which completed before answer generation. Total latency equaled the sum of all node execution times.

Parallel convergence architecture: The redesigned architecture fans out from the start node to multiple parallel analysis nodes: knowledge retriever, intent classifier, interest signals detector, visitor profiler, and third-party enricher. These nodes execute concurrently in the first superstep. The action router node then serves as the convergence point, receiving edges from the relevant analysis nodes. LangGraph's superstep model guarantees that all upstream nodes complete before the router executes.

Key architectural decisions:

Separation of classification from routing: The design separates two distinct operations. Intent classification requires an LLM call, which is computationally expensive. Routing logic is deterministic and lightweight. By separating these concerns, the expensive LLM-based classification can run in parallel with other analysis nodes, while the fast routing logic remains on the critical path with minimal latency overhead.

Convergence point design: The action router receives inputs from three upstream nodes: the knowledge retriever (providing relevant documents), the intent classifier (providing the visitor's intent category), and the interest signals detector (providing the cumulative interest score). The superstep model guarantees all three inputs are available before the router executes, ensuring complete context for routing decisions.

Priority-based routing algorithm: The routing logic follows a deterministic priority order. Certain intents (support requests, off-topic conversations) take precedence regardless of other factors, as they require immediate specialized handling. When these "blocking intents" are not present, the router considers the cumulative interest score to determine if a demo proposal is appropriate, then falls back to intent-specific handling.

Conditional edges for extensibility: The architecture uses conditional edges (a LangGraph feature that allows runtime path selection) extending from the action router. Currently, all paths lead to the answer generation node, but the structure supports future specialized handlers for different conversation types.

Testing & Validation

Method: Architecture validation through production observation and distributed tracing.

Superstep timing analysis: All nodes were instrumented with Langfuse tracing to measure individual execution times and verify that nodes in the first superstep execute in parallel rather than sequentially.

Context availability validation: Testing confirmed that the action router receives all required inputs (intent classification, interest signals, and retrieved documents) with complete state integrity across the superstep boundary.

Routing logic testing: Unit tests cover all combinations of intent types and interest scores, verifying correct action determination. Integration tests confirm proper graph traversal to the answer generation node.

Results

Latency Improvement:

Architecture Superstep 1 Nodes Latency Context Quality
Sequential (v1) 1 [TO BE MEASURED] Partial
Parallel (v2) 5 [TO BE MEASURED] Complete

Latency Reduction: [PENDING BENCHMARK] (expected significant improvement through parallel execution)

Node Execution Times (to be measured via Langfuse production traces):

Node Execution Time Parallel Group
knowledge_retriever [TO BE MEASURED] Superstep 1
intent_classifier [TO BE MEASURED] Superstep 1
interest_signals_detector [TO BE MEASURED] Superstep 1
visitor_profiler [TO BE MEASURED] Superstep 1
third_party_enricher [TO BE MEASURED] Superstep 1 (background)
action_router [TO BE MEASURED] Superstep 2
answer_writer [TO BE MEASURED] Superstep 3

Context Completeness at action_router: Verified through integration testing that all three required inputs (visitor intent, cumulative interest score, and retrieved documents) are available at the router.

Conclusion

SUCCESS: The parallel architecture with convergence at the action router was successfully implemented. The architecture achieves:

  1. Latency reduction through parallel execution of analysis nodes (specific improvement pending production benchmarks)
  2. Complete context for routing decisions, combining intent, interest signals, and retrieved knowledge
  3. Future extensibility through conditional edges prepared for specialized handlers
  4. Type safety through a centralized enumeration serving as the single source of truth for node names

The key insight from this research is that computationally expensive operations (LLM-based classification) and lightweight operations (deterministic routing) should be architecturally separated. This allows the expensive operations to run in parallel while keeping the routing logic on the critical path for minimal latency overhead. The superstep execution model, while imposing synchronization constraints, provides strong guarantees about state consistency that simplify the routing logic design.


R&D Innovations Identified

The following components represent original R&D work that lifts technical uncertainties not addressed by existing solutions:

Component Complexity Description
Multi-Agent LangGraph Architecture Very High Parallel execution of multiple AI agents with coherent state merging
TenantAwareNeo4JStorage Very High Multi-tenant isolation layer for LightRAG on shared Neo4j infrastructure
5-Tier Enrichment Pipeline High Cascading B2B visitor enrichment from multiple data sources
Visitor Profile Merge Algorithm High Conflict resolution for merging visitor data from concurrent sources
Interest Signals Detection High LLM-based detection of buying signals with configurable thresholds
Graph Latency Optimization High Parallel node arrangement respecting superstep execution constraints
Intent Classifier + Action Router Medium Separation of LLM classification from deterministic routing
Agent Configuration Resolver Medium Cascading configuration for multi-tenant behavioral customization

Technical Challenges Documented ("Verrous Technologiques")

The following technical uncertainties required original research and experimentation:

  1. Multi-agent state coherence: How to parallelize LLM agents while ensuring coherent state merging when multiple agents update shared state simultaneously?
  2. Multi-tenant RAG isolation: How to isolate tenant data on LightRAG infrastructure that does not natively support multi-tenancy?
  3. Cascading B2B enrichment: How to enrich B2B visitor profiles from multiple sources with intelligent fallback when primary sources fail?
  4. Configurable signal detection: How to detect buying signals with industry-specific rules that can be configured per client without code changes?
  5. Intent-based routing: How to route user intents to specialized agents while maintaining a lightweight, maintainable architecture?
  6. Graph latency optimization: How to arrange graph nodes for minimal latency while respecting superstep execution model constraints?

Estimated R&D Hours (July-December 2025)

Month Focus Hours
July Foundation & Multi-Tenant Architecture 210
August Production Deployment & Redis History 200
September MongoDB Integration & Observability 260
October Visitor Profiling & Widget Development 280
November Enrichment Pipeline & Dashboard 300
December Intent Router POC & JEI Documentation 200
Total ~1,450h

Current Month Development Summary

Graph Structure Refinement

The graph architecture was consolidated into a shared module serving as the single source of truth for the execution topology. This included implementing parallel fan-out from the start node to five concurrent analysis nodes, establishing the action router as the convergence point for intent, interest signals, and retrieved knowledge, and adding conditional edges to support future routing extensibility.

Intent Classification and Action Routing

The monolithic intent router was decomposed into two specialized components: an intent classifier that performs LLM-based visitor intent classification (executing in parallel with other analysis nodes), and an action router that implements deterministic priority-based routing logic (positioned on the critical path). Comprehensive observability was added through Langfuse tracing integration.

Type Safety Improvements

A centralized enumeration was introduced as the single source of truth for node names, eliminating string literals throughout the codebase. Type checking with mypy was enabled across all packages, with enhanced type hints in test files and improved Pydantic model validation.

Testing Framework Enhancement

Unit and integration tests were separated into distinct test suites with appropriate markers. LLM integration test markers were added to distinguish tests requiring actual LLM API calls. Mocking patterns for Supabase were improved, and comprehensive unit tests were added for the action routing logic.


R&D Activities Summary

The following R&D activities were conducted during December 2025:

  • Intent Router POC: Design and implementation of intent classification and multi-agent routing architecture
  • Agent Configuration Resolver: Cascading configuration system for per-client behavioral customization
  • State Serialization Fixes: Custom deserialization handlers for nested Pydantic model compatibility
  • Graph Architecture Optimization: Parallel execution design with convergence at action router
  • Superstep Execution Model Analysis: Research into Pregel-inspired node orchestration patterns

Other Development (Non-R&D)

The following activities represent standard software development work that does not qualify as R&D under CIR/JEI criteria:

  • JEI documentation preparation
  • Type safety enforcement with mypy
  • Testing framework enhancement
  • Dashboard user interface improvements

Next Steps (Q1 2026)

The following research directions are planned for continuation in Q1 2026:

  1. Full Intent-Based Routing: Complete the agent specialization work by implementing differentiated prompts and handling strategies for each intent category. This will test the hypothesis that specialized responses improve conversion rates.

  2. Prompt Modularization: Implement the architecture decision record (ADR) approved design for composable prompts, enabling shared behaviors to be defined once and composed with intent-specific content.

  3. Advanced Analytics: Develop funnel analysis, cohort comparison, and A/B testing infrastructure to enable rigorous measurement of R&D outcomes.

  4. Multi-Language Support: Extend the platform to support additional languages beyond English and French, addressing internationalization challenges in intent classification and response generation.


Measured Business Impact (Q4 2025)

This section presents measured business metrics that validate the R&D investments made during Q4 2025. These metrics were collected from production analytics across all active clients.

Engagement Rate Evolution

The engagement rate measures the percentage of website visitors who initiate an interaction with the conversational widget. The following data shows the progression from October to December 2025:

Engagement Rate Q4 2025

Month Engagement Rate Change vs Oct
October 2025 1.24% baseline
November 2025 1.73% +40%
December 2025 1.86% +50%

Engaged Conversations (2+ turns)

This metric measures the percentage of conversations that reach meaningful depth, defined as two or more message exchanges. Higher values indicate that visitors find the conversation valuable enough to continue beyond the initial interaction.

Engaged Conversations Q4 2025

Month Engaged Conversations Change vs Oct
October 2025 29.24% baseline
November 2025 39.27% +34%
December 2025 46.32% +58%

Conversion Rate (Interaction to Form Submitted)

The conversion rate measures the percentage of widget interactions that result in a completed form submission (demo request or email capture). This is the primary business outcome metric.

Conversion Rate Q4 2025

Month Conversion Rate Change vs Nov
November 2025 2.89% baseline
December 2025 3.15% +9%

Attribution Analysis

The following analysis attributes the observed improvements to specific R&D innovations:

Engagement Rate improvement (+50% vs October) is attributed to the Dynamic Questions feature, which generates context-aware opening questions based on visitor profile and page context. This R&D work involved developing algorithms for question selection, rotation, and timing optimization.

Engaged Conversations improvement (+58% vs October) is attributed to the Suggested Answers and Follow-ups features developed in November. Suggested answers provide pre-computed relevant responses that visitors can select with a single click, reducing friction. Suggested follow-ups proactively offer next questions to continue the conversation.

Conversion Rate improvement (+9% vs November) is attributed to the Interest Signals Detection system, which uses LLM-based analysis to detect buying signals such as pricing inquiries, demo requests, and technical deep-dives. The system proposes demos at optimal moments based on cumulative interest scoring with configurable per-client thresholds.

Business Impact Summary (Q4 2025)

Metric Nov 2025 Dec 2025 Month-over-Month
Initial Engagement 1.73% 1.86% +8%
Engaged Conversations (2+ turns) 39.27% 46.32% +18%
Conversion Rate 2.89% 3.15% +9%

These measured improvements provide evidence that the R&D investments in interest signals detection, suggested answers, and dynamic questions produce measurable business value.