ADR: Prompt Modularization - Multi-Agent Router Architecture¶

Status¶

Accepted (see also: Langfuse Folder Hierarchy for prompt organization)

Date¶

2025-12-19

Context¶

The current AI Sales Agent uses a monolithic prompt architecture with several problems:

One prompt per client (~2000+ lines each) - duplicated across all clients in Langfuse
Shared behavior duplicated - guardrails, tone, formatting repeated in every client prompt
Maintenance burden - changes require updating each client's prompt separately
Risk of drift - clients can diverge in behavior over time
Instruction following issues - LLM struggles with complex, branching instructions
Token inefficiency - full prompt loaded for every request regardless of intent

Decision¶

Refactor into a multi-agent router architecture with:

LLM-based intent router (fast model) classifying each message
6 specialized agent prompts shared across all clients
Client-specific config in database (qualification criteria, CTAs, case studies)
Composable prompt fragments in Langfuse (guardrails, tone, formatting)

Architecture Overview¶

flowchart TB classDef router fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000 classDef agent fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000 classDef data fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000 classDef assembly fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000 MSG([User Message]) subgraph ROUTER["LLM ROUTER (Claude Haiku)"] direction LR CLASSIFY["Classify Intent + Confidence Score"]:::router end subgraph AGENTS["SPECIALIZED AGENTS"] direction LR EDU["EDUCATOR ~400 tokens"]:::agent QUAL["QUALIFIER ~400 tokens"]:::agent CTA["CTA PROPOSAL ~300 tokens"]:::agent SUPPORT["SUPPORT ~300 tokens"]:::agent OFFTOPIC["OFF-TOPIC ~200 tokens"]:::agent OTHER["OTHER ~200 tokens"]:::agent end subgraph SOURCES["DATA SOURCES"] direction TB LANGFUSE["Langfuse Agent Templates"]:::data DB["Database Client Config"]:::data RUNTIME["Runtime Context"]:::data end ASSEMBLY["PROMPT ASSEMBLY Template + Config + Context"]:::assembly RESPONSE([Response to User]) MSG --> ROUTER ROUTER --> AGENTS AGENTS --> ASSEMBLY SOURCES --> ASSEMBLY ASSEMBLY --> RESPONSE

Agent Roles¶

Agent	Trigger	Goal	Behavior
Educator	How/What/Why questions, feature inquiries	Engagement, build interest	Answer → Hook to next topic
Qualifier	Traffic/company info shared, missing qualification data	Qualify lead, set expectations	Validate → Ask ONE question → Promise value
CTA Proposal	Qualified + buying signals	Get email / book demo	Assumptive close using gathered context
Redirect	Support issues, off-topic, jobs/press/partnerships	Route to correct channel	Acknowledge → Redirect to docs/support/contact

Note: The Redirect agent merges Support, Off-Topic, and Other Requests into a single handler that redirects users to appropriate resources (documentation, support portal, contact form). It skips RAG retrieval for faster responses.

LLM Router¶

Small/fast model (Claude Haiku) classifies each message:

Classify the user's message into one of these agents:
- educator: Product questions, how/what/why about features
- qualifier: User shared traffic/company info, or need to gather qualification data
- cta: User is qualified + showing buying signals
- support: Existing customer with issues
- offtopic: Completely unrelated
- other: Job inquiries, press, partnerships

Respond with JSON:
{
  "agent": "educator|qualifier|cta|support|offtopic|other",
  "confidence": 0.0-1.0,
  "reasoning": "Brief explanation"
}

Latency: ~200-500ms (runs in parallel with RAG retrieval) Cost: ~$0.0001 per classification

Data Sources¶

Source	Content	Examples
Langfuse	Meta-template	`rose-internal/response-agents/meta-template` (shared structure with embedded tone/guardrails)
Langfuse	Agent templates	`rose-internal/response-agents/redirect/template` (template with `{{lf_client_agent_instructions}}` slot)
Langfuse	Client instructions	`rose-internal/response-agents/redirect/instructions/abtasty.com` (inserted into template)
Database	Client-specific config	`agent_config` table: identity, prompt content, behavior settings
Runtime	Conversation context	Message, history, visitor profile, RAG context

3-Level Prompt Hierarchy¶

Prompts are assembled from three levels, with client instructions inserted into the agent template:

rose-internal/response-agents/meta-template (Level 1 - shared across all agents)
  │
  ├── Embedded: Role, Tone, Formatting, Guardrails
  │
  └── {{lf_agent_instructions}} ← Replaced by agent template
        │
        └── rose-internal/response-agents/{agent}/template (Level 2 - agent template)
              │
              └── {{lf_client_agent_instructions}} ← Client instructions inserted here
                    │
                    └── rose-internal/response-agents/{agent}/instructions/{domain} (Level 3)

Key Principles: - RAG context is NOT in meta-template - included only in agent instructions that need it - Redirect agent skips RAG - no knowledge base needed for redirects (faster response) - Client instructions are optional - fallback to empty string if not found in Langfuse

Template Variable Naming Convention¶

Variables follow the pattern: {source}_{scope}_{name}

Source Prefixes (where the data comes from):

Prefix	Source	Owner
`lf_`	Langfuse	Engineering team, versioned in Langfuse
`db_`	Database	Client success team, stored in `agent_config` table
`rt_`	Runtime	System-generated per request/session

Key Principle: Role, tone, formatting, and guardrails are embedded directly in the meta-template (not injected as variables). Only agent-specific instructions, client data, and runtime context are injected.

Variable Reference:

Variable	Source	Description
`lf_agent_type`	Langfuse/Router	Agent type name (Educator, Qualifier, CTA, Redirect)
`lf_agent_instructions`	Langfuse	Agent-specific behavior and goals (Level 2)
`lf_client_agent_instructions`	Langfuse	Client-specific instructions (Level 3, optional)
`db_client_company`	Database	Company name
`db_client_website`	Database	Company website URL
`db_client_guardrails`	Database	Client-specific rules (pricing, case studies, calculators)
`rt_session_language`	Runtime	Conversation language
`rt_session_visitor_profile`	Runtime	Visitor data, qualification state
`rt_session_conversation_history`	Runtime	Previous messages in conversation
`rt_session_turn_number`	Runtime	Current turn number
`rt_request_rag_context`	Runtime	Retrieved knowledge base content (included in agent instructions, not meta-template)

Langfuse Prompt Structure¶

There is ONE meta-template (agent-meta-template) with shared content embedded directly and variable slots for injected content:

# {{lf_agent_type}} Agent

**ROLE** — You are a consultative Sales Agent on the {{db_client_company}}
website ({{db_client_website}}) who provides accurate, concrete and convincing
information to keep the conversation moving forward.

**LANGUAGE:** ALWAYS reply in {{rt_session_language}}. This is a hard rule.

---

# WRITING STYLE, TONE & FORMATTING
[Embedded directly: tone guidelines, formatting rules]

---

# YOUR TASK
{{lf_agent_instructions}}

---

# CRITICAL GUARDRAILS
[Embedded directly: universal safety rules]

{{db_client_guardrails}}

---

# VISITOR PROFILE
{{rt_session_visitor_profile}}

---

# CONVERSATION HISTORY
**TURN NUMBER:** {{rt_session_turn_number}}
{{rt_session_conversation_history}}

Note: RAG context (rt_request_rag_context) is NOT in the meta-template. Agents that need knowledge base context include it in their lf_agent_instructions. The Redirect agent skips RAG entirely for faster responses.

Database Schema for Client Config¶

Client-specific configuration is stored in a dedicated agent_config table with clear column groupings:

Column Group	Purpose	Examples
Identity	Who the client is	`company_name`, `website_url`, `client_description`
Prompt Content (`prompt_*`)	Content injected into LLM prompts	`prompt_pricing_first_response`, `prompt_case_studies`, `prompt_calculators`
Behavior (`behavior_*`)	Agent runtime behavior	`behavior_interest_signals`, `behavior_interest_threshold`

This table consolidates and migrates the existing site_configs.agent_config JSONB column into typed columns with clear naming conventions.

Qualifier Agent Detail¶

The Qualifier agent has configurable qualification dimensions:

flowchart LR classDef state fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000 UNKNOWN["UNKNOWN No qualification data"]:::state PARTIAL["PARTIAL Some data collected"]:::state QUALIFIED["QUALIFIED Ready for CTA"]:::state UNKNOWN -->|"User shares info"| PARTIAL PARTIAL -->|"Sandwich Rule: Validate → Ask ONE → Promise"| PARTIAL PARTIAL -->|"All criteria met"| QUALIFIED

Traffic Segmentation Example:

Segment	Threshold	Expectation Setting
High	>100k monthly	"Strong fit. Tests: 2-3 weeks."
Mid	30k-100k	"Can support experimentation. Tests: 3-4 weeks."
Low	10k-30k	"Classic A/B testing challenging. Tests: months."
Very Low	<10k	"Traditional A/B too slow. Use heatmaps, surveys."

Implementation Phases¶

Phase 1: Extract & Design¶

Export one client's monolithic prompt from Langfuse
Identify shared vs client-specific content
Design database schema extension for client config

Phase 2: Database Schema¶

Create new agent_config table with identity, prompt content, and behavior columns
Migrate existing site_configs.agent_config JSONB data to new table
Create Pydantic models for validation

Phase 3: Langfuse Templates¶

Create ONE meta-template (agent-meta-template) with embedded tone/guardrails
Create per-agent instruction prompts (agent-{type}-instructions)
Create per-agent example prompts (agent-{type}-examples)
Test prompt assembly with placeholders

Phase 4: Router Implementation¶

Create LLM router with classification prompt
Add qualification state tracking
Unit tests for routing accuracy

Phase 5: Integration¶

Modify chat service to use router + prompt assembly
Add agent_type to conversation metadata
Feature flag for gradual rollout

Phase 6: Migration¶

Migrate first client to new system
A/B test against monolithic prompt
Roll out to remaining clients
Deprecate per-client prompts in Langfuse

Consequences¶

Positive¶

Single point of change: Update one template, all clients benefit
Guaranteed consistency: No drift between client behaviors
Faster responses: Load only relevant agent prompt (~400 tokens vs ~2000+)
Better instruction following: Smaller, focused prompts
Lower cost: Reduced token usage per request
Easier maintenance: Clear separation of concerns
Testable: Each agent can be tested independently
Configurable: Client-specific behavior via database, not prompt duplication

Negative¶

Router latency: Additional LLM call per message (~200-500ms)
Router cost: Extra ~$0.0001 per message
Complexity: More components to orchestrate
Migration effort: Significant refactoring of existing system
Edge cases: Router misclassification possible (mitigated by confidence scores)

Neutral¶

Infrastructure change: Requires prompt assembly layer
Testing strategy: Need new approach for multi-prompt system
Monitoring: Need to track router accuracy and agent usage

Alternatives Considered¶

1. Single Template with Dynamic Injection¶

One prompt template with {{response_instruction}} and {{response_examples}} swapped per phase.

Rejected because: Still loads full prompt every time; doesn't solve maintenance problem of per-client prompts.

2. Rule-Based Router¶

Python logic with keyword matching instead of LLM classification.

Rejected because: Misses nuanced cases; requires constant rule updates; less adaptable.

3. Prompt Compression¶

Reduce monolithic prompt size through aggressive compression.

Rejected because: Doesn't solve instruction following issues; still one massive document per client.

4. Fine-Tuned Model¶

Train a custom model on sales conversations.

Rejected because: High cost; less flexible; harder to iterate on behavior.