Skip to content

ADR: Prompt Modularization - Multi-Agent Router Architecture

Status

Accepted (see also: Langfuse Folder Hierarchy for prompt organization)

Date

2025-12-19

Context

The current AI Sales Agent uses a monolithic prompt architecture with several problems:

  1. One prompt per client (~2000+ lines each) - duplicated across all clients in Langfuse
  2. Shared behavior duplicated - guardrails, tone, formatting repeated in every client prompt
  3. Maintenance burden - changes require updating each client's prompt separately
  4. Risk of drift - clients can diverge in behavior over time
  5. Instruction following issues - LLM struggles with complex, branching instructions
  6. Token inefficiency - full prompt loaded for every request regardless of intent

Decision

Refactor into a multi-agent router architecture with:

  1. LLM-based intent router (fast model) classifying each message
  2. 6 specialized agent prompts shared across all clients
  3. Client-specific config in database (qualification criteria, CTAs, case studies)
  4. Composable prompt fragments in Langfuse (guardrails, tone, formatting)

Architecture Overview

flowchart TB classDef router fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000 classDef agent fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000 classDef data fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000 classDef assembly fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000 MSG([User Message]) subgraph ROUTER["LLM ROUTER (Claude Haiku)"] direction LR CLASSIFY["Classify Intent<br/>+ Confidence Score"]:::router end subgraph AGENTS["SPECIALIZED AGENTS"] direction LR EDU["EDUCATOR<br/>~400 tokens"]:::agent QUAL["QUALIFIER<br/>~400 tokens"]:::agent CTA["CTA PROPOSAL<br/>~300 tokens"]:::agent SUPPORT["SUPPORT<br/>~300 tokens"]:::agent OFFTOPIC["OFF-TOPIC<br/>~200 tokens"]:::agent OTHER["OTHER<br/>~200 tokens"]:::agent end subgraph SOURCES["DATA SOURCES"] direction TB LANGFUSE["Langfuse<br/>Agent Templates"]:::data DB["Database<br/>Client Config"]:::data RUNTIME["Runtime<br/>Context"]:::data end ASSEMBLY["PROMPT ASSEMBLY<br/>Template + Config + Context"]:::assembly RESPONSE([Response to User]) MSG --> ROUTER ROUTER --> AGENTS AGENTS --> ASSEMBLY SOURCES --> ASSEMBLY ASSEMBLY --> RESPONSE

Agent Roles

Agent Trigger Goal Behavior
Educator How/What/Why questions, feature inquiries Engagement, build interest Answer → Hook to next topic
Qualifier Traffic/company info shared, missing qualification data Qualify lead, set expectations Validate → Ask ONE question → Promise value
CTA Proposal Qualified + buying signals Get email / book demo Assumptive close using gathered context
Redirect Support issues, off-topic, jobs/press/partnerships Route to correct channel Acknowledge → Redirect to docs/support/contact

Note: The Redirect agent merges Support, Off-Topic, and Other Requests into a single handler that redirects users to appropriate resources (documentation, support portal, contact form). It skips RAG retrieval for faster responses.

LLM Router

Small/fast model (Claude Haiku) classifies each message:

Classify the user's message into one of these agents:
- educator: Product questions, how/what/why about features
- qualifier: User shared traffic/company info, or need to gather qualification data
- cta: User is qualified + showing buying signals
- support: Existing customer with issues
- offtopic: Completely unrelated
- other: Job inquiries, press, partnerships

Respond with JSON:
{
  "agent": "educator|qualifier|cta|support|offtopic|other",
  "confidence": 0.0-1.0,
  "reasoning": "Brief explanation"
}

Latency: ~200-500ms (runs in parallel with RAG retrieval) Cost: ~$0.0001 per classification

Data Sources

Source Content Examples
Langfuse Meta-template rose-internal/response-agents/meta-template (shared structure with embedded tone/guardrails)
Langfuse Agent templates rose-internal/response-agents/redirect/template (template with {{lf_client_agent_instructions}} slot)
Langfuse Client instructions rose-internal/response-agents/redirect/instructions/abtasty.com (inserted into template)
Database Client-specific config agent_config table: identity, prompt content, behavior settings
Runtime Conversation context Message, history, visitor profile, RAG context

3-Level Prompt Hierarchy

Prompts are assembled from three levels, with client instructions inserted into the agent template:

rose-internal/response-agents/meta-template (Level 1 - shared across all agents)
  ├── Embedded: Role, Tone, Formatting, Guardrails
  └── {{lf_agent_instructions}} ← Replaced by agent template
        └── rose-internal/response-agents/{agent}/template (Level 2 - agent template)
              └── {{lf_client_agent_instructions}} ← Client instructions inserted here
                    └── rose-internal/response-agents/{agent}/instructions/{domain} (Level 3)

Key Principles: - RAG context is NOT in meta-template - included only in agent instructions that need it - Redirect agent skips RAG - no knowledge base needed for redirects (faster response) - Client instructions are optional - fallback to empty string if not found in Langfuse

Template Variable Naming Convention

Variables follow the pattern: {source}_{scope}_{name}

Source Prefixes (where the data comes from):

Prefix Source Owner
lf_ Langfuse Engineering team, versioned in Langfuse
db_ Database Client success team, stored in agent_config table
rt_ Runtime System-generated per request/session

Key Principle: Role, tone, formatting, and guardrails are embedded directly in the meta-template (not injected as variables). Only agent-specific instructions, client data, and runtime context are injected.

Variable Reference:

Variable Source Description
lf_agent_type Langfuse/Router Agent type name (Educator, Qualifier, CTA, Redirect)
lf_agent_instructions Langfuse Agent-specific behavior and goals (Level 2)
lf_client_agent_instructions Langfuse Client-specific instructions (Level 3, optional)
db_client_company Database Company name
db_client_website Database Company website URL
db_client_guardrails Database Client-specific rules (pricing, case studies, calculators)
rt_session_language Runtime Conversation language
rt_session_visitor_profile Runtime Visitor data, qualification state
rt_session_conversation_history Runtime Previous messages in conversation
rt_session_turn_number Runtime Current turn number
rt_request_rag_context Runtime Retrieved knowledge base content (included in agent instructions, not meta-template)

Langfuse Prompt Structure

There is ONE meta-template (agent-meta-template) with shared content embedded directly and variable slots for injected content:

# {{lf_agent_type}} Agent

**ROLE** — You are a consultative Sales Agent on the {{db_client_company}}
website ({{db_client_website}}) who provides accurate, concrete and convincing
information to keep the conversation moving forward.

**LANGUAGE:** ALWAYS reply in {{rt_session_language}}. This is a hard rule.

---

# WRITING STYLE, TONE & FORMATTING
[Embedded directly: tone guidelines, formatting rules]

---

# YOUR TASK
{{lf_agent_instructions}}

---

# CRITICAL GUARDRAILS
[Embedded directly: universal safety rules]

{{db_client_guardrails}}

---

# VISITOR PROFILE
{{rt_session_visitor_profile}}

---

# CONVERSATION HISTORY
**TURN NUMBER:** {{rt_session_turn_number}}
{{rt_session_conversation_history}}

Note: RAG context (rt_request_rag_context) is NOT in the meta-template. Agents that need knowledge base context include it in their lf_agent_instructions. The Redirect agent skips RAG entirely for faster responses.

Database Schema for Client Config

Client-specific configuration is stored in a dedicated agent_config table with clear column groupings:

Column Group Purpose Examples
Identity Who the client is company_name, website_url, client_description
Prompt Content (prompt_*) Content injected into LLM prompts prompt_pricing_first_response, prompt_case_studies, prompt_calculators
Behavior (behavior_*) Agent runtime behavior behavior_interest_signals, behavior_interest_threshold

This table consolidates and migrates the existing site_configs.agent_config JSONB column into typed columns with clear naming conventions.

Qualifier Agent Detail

The Qualifier agent has configurable qualification dimensions:

flowchart LR classDef state fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000 UNKNOWN["UNKNOWN<br/>No qualification data"]:::state PARTIAL["PARTIAL<br/>Some data collected"]:::state QUALIFIED["QUALIFIED<br/>Ready for CTA"]:::state UNKNOWN -->|"User shares info"| PARTIAL PARTIAL -->|"Sandwich Rule:<br/>Validate → Ask ONE → Promise"| PARTIAL PARTIAL -->|"All criteria met"| QUALIFIED

Traffic Segmentation Example:

Segment Threshold Expectation Setting
High >100k monthly "Strong fit. Tests: 2-3 weeks."
Mid 30k-100k "Can support experimentation. Tests: 3-4 weeks."
Low 10k-30k "Classic A/B testing challenging. Tests: months."
Very Low <10k "Traditional A/B too slow. Use heatmaps, surveys."

Implementation Phases

Phase 1: Extract & Design

  1. Export one client's monolithic prompt from Langfuse
  2. Identify shared vs client-specific content
  3. Design database schema extension for client config

Phase 2: Database Schema

  1. Create new agent_config table with identity, prompt content, and behavior columns
  2. Migrate existing site_configs.agent_config JSONB data to new table
  3. Create Pydantic models for validation

Phase 3: Langfuse Templates

  1. Create ONE meta-template (agent-meta-template) with embedded tone/guardrails
  2. Create per-agent instruction prompts (agent-{type}-instructions)
  3. Create per-agent example prompts (agent-{type}-examples)
  4. Test prompt assembly with placeholders

Phase 4: Router Implementation

  1. Create LLM router with classification prompt
  2. Add qualification state tracking
  3. Unit tests for routing accuracy

Phase 5: Integration

  1. Modify chat service to use router + prompt assembly
  2. Add agent_type to conversation metadata
  3. Feature flag for gradual rollout

Phase 6: Migration

  1. Migrate first client to new system
  2. A/B test against monolithic prompt
  3. Roll out to remaining clients
  4. Deprecate per-client prompts in Langfuse

Consequences

Positive

  • Single point of change: Update one template, all clients benefit
  • Guaranteed consistency: No drift between client behaviors
  • Faster responses: Load only relevant agent prompt (~400 tokens vs ~2000+)
  • Better instruction following: Smaller, focused prompts
  • Lower cost: Reduced token usage per request
  • Easier maintenance: Clear separation of concerns
  • Testable: Each agent can be tested independently
  • Configurable: Client-specific behavior via database, not prompt duplication

Negative

  • Router latency: Additional LLM call per message (~200-500ms)
  • Router cost: Extra ~$0.0001 per message
  • Complexity: More components to orchestrate
  • Migration effort: Significant refactoring of existing system
  • Edge cases: Router misclassification possible (mitigated by confidence scores)

Neutral

  • Infrastructure change: Requires prompt assembly layer
  • Testing strategy: Need new approach for multi-prompt system
  • Monitoring: Need to track router accuracy and agent usage

Alternatives Considered

1. Single Template with Dynamic Injection

One prompt template with {{response_instruction}} and {{response_examples}} swapped per phase.

Rejected because: Still loads full prompt every time; doesn't solve maintenance problem of per-client prompts.

2. Rule-Based Router

Python logic with keyword matching instead of LLM classification.

Rejected because: Misses nuanced cases; requires constant rule updates; less adaptable.

3. Prompt Compression

Reduce monolithic prompt size through aggressive compression.

Rejected because: Doesn't solve instruction following issues; still one massive document per client.

4. Fine-Tuned Model

Train a custom model on sales conversations.

Rejected because: High cost; less flexible; harder to iterate on behavior.