ADR: Prompt Modularization - Multi-Agent Router Architecture¶
Status¶
Accepted (see also: Langfuse Folder Hierarchy for prompt organization)
Date¶
2025-12-19
Context¶
The current AI Sales Agent uses a monolithic prompt architecture with several problems:
- One prompt per client (~2000+ lines each) - duplicated across all clients in Langfuse
- Shared behavior duplicated - guardrails, tone, formatting repeated in every client prompt
- Maintenance burden - changes require updating each client's prompt separately
- Risk of drift - clients can diverge in behavior over time
- Instruction following issues - LLM struggles with complex, branching instructions
- Token inefficiency - full prompt loaded for every request regardless of intent
Decision¶
Refactor into a multi-agent router architecture with:
- LLM-based intent router (fast model) classifying each message
- 6 specialized agent prompts shared across all clients
- Client-specific config in database (qualification criteria, CTAs, case studies)
- Composable prompt fragments in Langfuse (guardrails, tone, formatting)
Architecture Overview¶
Agent Roles¶
| Agent | Trigger | Goal | Behavior |
|---|---|---|---|
| Educator | How/What/Why questions, feature inquiries | Engagement, build interest | Answer → Hook to next topic |
| Qualifier | Traffic/company info shared, missing qualification data | Qualify lead, set expectations | Validate → Ask ONE question → Promise value |
| CTA Proposal | Qualified + buying signals | Get email / book demo | Assumptive close using gathered context |
| Redirect | Support issues, off-topic, jobs/press/partnerships | Route to correct channel | Acknowledge → Redirect to docs/support/contact |
Note: The Redirect agent merges Support, Off-Topic, and Other Requests into a single handler that redirects users to appropriate resources (documentation, support portal, contact form). It skips RAG retrieval for faster responses.
LLM Router¶
Small/fast model (Claude Haiku) classifies each message:
Classify the user's message into one of these agents:
- educator: Product questions, how/what/why about features
- qualifier: User shared traffic/company info, or need to gather qualification data
- cta: User is qualified + showing buying signals
- support: Existing customer with issues
- offtopic: Completely unrelated
- other: Job inquiries, press, partnerships
Respond with JSON:
{
"agent": "educator|qualifier|cta|support|offtopic|other",
"confidence": 0.0-1.0,
"reasoning": "Brief explanation"
}
Latency: ~200-500ms (runs in parallel with RAG retrieval) Cost: ~$0.0001 per classification
Data Sources¶
| Source | Content | Examples |
|---|---|---|
| Langfuse | Meta-template | rose-internal/response-agents/meta-template (shared structure with embedded tone/guardrails) |
| Langfuse | Agent templates | rose-internal/response-agents/redirect/template (template with {{lf_client_agent_instructions}} slot) |
| Langfuse | Client instructions | rose-internal/response-agents/redirect/instructions/abtasty.com (inserted into template) |
| Database | Client-specific config | agent_config table: identity, prompt content, behavior settings |
| Runtime | Conversation context | Message, history, visitor profile, RAG context |
3-Level Prompt Hierarchy¶
Prompts are assembled from three levels, with client instructions inserted into the agent template:
rose-internal/response-agents/meta-template (Level 1 - shared across all agents)
│
├── Embedded: Role, Tone, Formatting, Guardrails
│
└── {{lf_agent_instructions}} ← Replaced by agent template
│
└── rose-internal/response-agents/{agent}/template (Level 2 - agent template)
│
└── {{lf_client_agent_instructions}} ← Client instructions inserted here
│
└── rose-internal/response-agents/{agent}/instructions/{domain} (Level 3)
Key Principles: - RAG context is NOT in meta-template - included only in agent instructions that need it - Redirect agent skips RAG - no knowledge base needed for redirects (faster response) - Client instructions are optional - fallback to empty string if not found in Langfuse
Template Variable Naming Convention¶
Variables follow the pattern: {source}_{scope}_{name}
Source Prefixes (where the data comes from):
| Prefix | Source | Owner |
|---|---|---|
lf_ |
Langfuse | Engineering team, versioned in Langfuse |
db_ |
Database | Client success team, stored in agent_config table |
rt_ |
Runtime | System-generated per request/session |
Key Principle: Role, tone, formatting, and guardrails are embedded directly in the meta-template (not injected as variables). Only agent-specific instructions, client data, and runtime context are injected.
Variable Reference:
| Variable | Source | Description |
|---|---|---|
lf_agent_type |
Langfuse/Router | Agent type name (Educator, Qualifier, CTA, Redirect) |
lf_agent_instructions |
Langfuse | Agent-specific behavior and goals (Level 2) |
lf_client_agent_instructions |
Langfuse | Client-specific instructions (Level 3, optional) |
db_client_company |
Database | Company name |
db_client_website |
Database | Company website URL |
db_client_guardrails |
Database | Client-specific rules (pricing, case studies, calculators) |
rt_session_language |
Runtime | Conversation language |
rt_session_visitor_profile |
Runtime | Visitor data, qualification state |
rt_session_conversation_history |
Runtime | Previous messages in conversation |
rt_session_turn_number |
Runtime | Current turn number |
rt_request_rag_context |
Runtime | Retrieved knowledge base content (included in agent instructions, not meta-template) |
Langfuse Prompt Structure¶
There is ONE meta-template (agent-meta-template) with shared content embedded directly and variable slots for injected content:
# {{lf_agent_type}} Agent
**ROLE** — You are a consultative Sales Agent on the {{db_client_company}}
website ({{db_client_website}}) who provides accurate, concrete and convincing
information to keep the conversation moving forward.
**LANGUAGE:** ALWAYS reply in {{rt_session_language}}. This is a hard rule.
---
# WRITING STYLE, TONE & FORMATTING
[Embedded directly: tone guidelines, formatting rules]
---
# YOUR TASK
{{lf_agent_instructions}}
---
# CRITICAL GUARDRAILS
[Embedded directly: universal safety rules]
{{db_client_guardrails}}
---
# VISITOR PROFILE
{{rt_session_visitor_profile}}
---
# CONVERSATION HISTORY
**TURN NUMBER:** {{rt_session_turn_number}}
{{rt_session_conversation_history}}
Note: RAG context (rt_request_rag_context) is NOT in the meta-template. Agents that need knowledge base context include it in their lf_agent_instructions. The Redirect agent skips RAG entirely for faster responses.
Database Schema for Client Config¶
Client-specific configuration is stored in a dedicated agent_config table with clear column groupings:
| Column Group | Purpose | Examples |
|---|---|---|
| Identity | Who the client is | company_name, website_url, client_description |
Prompt Content (prompt_*) |
Content injected into LLM prompts | prompt_pricing_first_response, prompt_case_studies, prompt_calculators |
Behavior (behavior_*) |
Agent runtime behavior | behavior_interest_signals, behavior_interest_threshold |
This table consolidates and migrates the existing site_configs.agent_config JSONB column into typed columns with clear naming conventions.
Qualifier Agent Detail¶
The Qualifier agent has configurable qualification dimensions:
Traffic Segmentation Example:
| Segment | Threshold | Expectation Setting |
|---|---|---|
| High | >100k monthly | "Strong fit. Tests: 2-3 weeks." |
| Mid | 30k-100k | "Can support experimentation. Tests: 3-4 weeks." |
| Low | 10k-30k | "Classic A/B testing challenging. Tests: months." |
| Very Low | <10k | "Traditional A/B too slow. Use heatmaps, surveys." |
Implementation Phases¶
Phase 1: Extract & Design¶
- Export one client's monolithic prompt from Langfuse
- Identify shared vs client-specific content
- Design database schema extension for client config
Phase 2: Database Schema¶
- Create new
agent_configtable with identity, prompt content, and behavior columns - Migrate existing
site_configs.agent_configJSONB data to new table - Create Pydantic models for validation
Phase 3: Langfuse Templates¶
- Create ONE meta-template (
agent-meta-template) with embedded tone/guardrails - Create per-agent instruction prompts (
agent-{type}-instructions) - Create per-agent example prompts (
agent-{type}-examples) - Test prompt assembly with placeholders
Phase 4: Router Implementation¶
- Create LLM router with classification prompt
- Add qualification state tracking
- Unit tests for routing accuracy
Phase 5: Integration¶
- Modify chat service to use router + prompt assembly
- Add agent_type to conversation metadata
- Feature flag for gradual rollout
Phase 6: Migration¶
- Migrate first client to new system
- A/B test against monolithic prompt
- Roll out to remaining clients
- Deprecate per-client prompts in Langfuse
Consequences¶
Positive¶
- Single point of change: Update one template, all clients benefit
- Guaranteed consistency: No drift between client behaviors
- Faster responses: Load only relevant agent prompt (~400 tokens vs ~2000+)
- Better instruction following: Smaller, focused prompts
- Lower cost: Reduced token usage per request
- Easier maintenance: Clear separation of concerns
- Testable: Each agent can be tested independently
- Configurable: Client-specific behavior via database, not prompt duplication
Negative¶
- Router latency: Additional LLM call per message (~200-500ms)
- Router cost: Extra ~$0.0001 per message
- Complexity: More components to orchestrate
- Migration effort: Significant refactoring of existing system
- Edge cases: Router misclassification possible (mitigated by confidence scores)
Neutral¶
- Infrastructure change: Requires prompt assembly layer
- Testing strategy: Need new approach for multi-prompt system
- Monitoring: Need to track router accuracy and agent usage
Alternatives Considered¶
1. Single Template with Dynamic Injection¶
One prompt template with {{response_instruction}} and {{response_examples}} swapped per phase.
Rejected because: Still loads full prompt every time; doesn't solve maintenance problem of per-client prompts.
2. Rule-Based Router¶
Python logic with keyword matching instead of LLM classification.
Rejected because: Misses nuanced cases; requires constant rule updates; less adaptable.
3. Prompt Compression¶
Reduce monolithic prompt size through aggressive compression.
Rejected because: Doesn't solve instruction following issues; still one massive document per client.
4. Fine-Tuned Model¶
Train a custom model on sales conversations.
Rejected because: High cost; less flexible; harder to iterate on behavior.