Form Field Extraction & Dynamic CTA URLs¶
Overview¶
The Form Field Extraction feature enables dynamic CTA (Call-To-Action) URLs based on information collected during the conversation. When a user provides answers to questions (e.g., company size, industry), the system extracts these values and injects them into CTA URLs, enabling personalized routing to the appropriate forms or landing pages.
Key Features¶
- LLM-powered extraction: Uses AI to extract form field values from natural conversation
- Real-time URL updates: CTA URLs update dynamically as information is collected
- Flexible configuration: Define form fields and CTA templates per site in Supabase
- Confidence thresholds: Only uses high-confidence extractions (≥70%)
- Multi-form support: Supports multiple forms per site with different field configurations
Architecture¶
Sequence Diagram¶
sequenceDiagram
participant User
participant Frontend as React Frontend
participant API as FastAPI Backend
participant Graph as LangGraph
participant AW as answer_writer
participant FFE as form_field_extractor
participant SR as suggestion_router
participant Fin as finalize
participant LLM as LLM (GPT-4)
participant Supabase
User->>Frontend: Sends message<br/>(typed or clicks suggestion)
Frontend->>API: POST /api/lightrag/query/stream<br/>{query, siteName, sessionId}
API->>Graph: Invoke graph with state
Note over Graph: START node
Graph->>AW: answer_writer generates response
AW->>LLM: Generate answer
LLM-->>AW: Response text
par Parallel Execution
AW->>FFE: form_field_extractor
FFE->>Supabase: Get site form_config
Supabase-->>FFE: Form schema + CTA templates
FFE->>LLM: Extract field values from conversation
LLM-->>FFE: {company_size: "1-5", confidence: 0.9}
FFE->>FFE: Update form_collection_state
FFE->>Fin: finalize
and
AW->>SR: suggestion_router
SR->>Fin: finalize
end
Note over Fin: finalize node (response_node)
Fin->>Fin: Add messages to conversation
Graph-->>API: Final state with form_collection_state
API->>API: compute_cta_url_overrides()<br/>Inject values into CTA templates
API-->>Frontend: Stream complete event<br/>{cta_url_overrides: {cta_id: "url?size=1-5"}}
Frontend->>Frontend: setCTAOverrides(overrides)
Frontend->>Frontend: CTA buttons re-render<br/>with new URLs
Frontend-->>User: Display response with updated CTAs
Component Overview¶
flowchart TB
subgraph Frontend
UI[React Components]
Hook[useStreamingMessage]
CTA[ctaReplacer.ts]
BTN[InlineCTAButton]
end
subgraph Backend
API[FastAPI Route]
Graph[LangGraph]
FFE[form_field_extractor_node]
Builder[cta_url_builder.py]
end
subgraph Storage
Supabase[(Supabase)]
Redis[(Redis Memory)]
end
UI -->|Send Message| Hook
Hook -->|API Call| API
API -->|Invoke| Graph
Graph -->|Extract| FFE
FFE -->|Get Config| Supabase
FFE -->|Store State| Redis
Graph -->|Final State| API
API -->|Compute URLs| Builder
Builder -->|Read Config| Supabase
API -->|Stream Response| Hook
Hook -->|Set Overrides| CTA
CTA -->|Notify| BTN
BTN -->|Re-render| UI
Configuration¶
Form Config Structure (Supabase site_configs.form_config)¶
{
"forms": {
"lead_qualification": {
"id": "lead_qualification",
"name": "Lead Qualification",
"fields": [
{
"id": "company_size",
"label": "Company Size",
"type": "select",
"extraction_prompt": "Number of employees in the company",
"options": [
{"value": "1_to_5", "label": "1-5 employees"},
{"value": "6_to_20", "label": "6-20 employees"},
{"value": "21_to_100", "label": "21-100 employees"},
{"value": "100_plus", "label": "100+ employees"}
]
},
{
"id": "industry",
"label": "Industry",
"type": "select",
"extraction_prompt": "The industry or sector of the company",
"options": [
{"value": "tech", "label": "Technology"},
{"value": "finance", "label": "Finance"},
{"value": "healthcare", "label": "Healthcare"}
]
}
],
"ctas": [
{
"cta_id": "demo_cta",
"url_template": "https://example.com/demo?size={{company_size}}&industry={{industry}}"
},
{
"cta_id": "pricing_cta",
"url_template": "https://example.com/pricing?tier={{company_size}}"
}
]
}
}
}
Field Configuration Options¶
| Property | Type | Description |
|---|---|---|
id |
string | Unique identifier for the field (used in URL templates) |
label |
string | Human-readable label |
type |
string | Field type: select, text, number |
extraction_prompt |
string | Hint for the LLM on what to extract |
options |
array | For select fields: valid values with labels |
CTA Template Syntax¶
URL templates use {{field_id}} placeholders:
- Placeholders are replaced with extracted values
- Unmatched placeholders remain as-is (or can be configured to be removed)
Backend Implementation¶
Key Files¶
| File | Purpose |
|---|---|
backend/packages/ixchat/ixchat/nodes/form_field_extractor.py |
LLM-powered value extraction |
backend/packages/ixchat/ixchat/utils/cta_url_builder.py |
URL template processing |
backend/packages/ixchat/ixchat/pydantic_models/form_collection.py |
State models |
backend/apps/api/search/ixsearch_api/routes/chat.py |
API endpoint |
Form Collection State¶
class FormCollectionState(BaseModel):
form_id: str | None = None
collected_values: dict[str, str] = {} # {field_id: extracted_value}
pending_fields: list[str] = []
is_complete: bool = False
last_extraction_turn: int = -1 # Prevents duplicate extraction
Extraction Process¶
- Get form config from Supabase for the site
- Identify unfilled fields (not yet in
collected_values) - Build conversation context from last 10 messages + current input
- Call LLM with structured output schema
- Filter by confidence (≥0.7 threshold)
- Update state with extracted values
LLM Extraction Prompt¶
The system uses a Langfuse-managed prompt (rose-internal-form-extraction) or falls back to:
Analyze this conversation and extract the following information:
- company_size: Number of employees in the company. Valid values: ["1_to_5", "6_to_20", "21_to_100", "100_plus"]
- industry: The industry or sector of the company. Valid values: ["tech", "finance", "healthcare"]
Conversation:
USER: I'm looking for a solution for my small team
ASSISTANT: Happy to help! How many people are on your team?
USER: We're about 5 people
For each field, return the extracted value or null if not found.
Frontend Implementation¶
Key Files¶
| File | Purpose |
|---|---|
frontend/shared/src/hooks/chat/useStreamingMessage.ts |
Receives CTA overrides |
frontend/shared/src/utils/content/ctaReplacer.ts |
Stores and applies overrides |
frontend/shared/src/components/InlineCTAButton.tsx |
Renders CTA buttons |
Override Flow¶
// 1. Receive in streaming hook (useStreamingMessage.ts:194)
const ctaUrlOverrides = chunk.metadata?.cta_url_overrides || null;
setCTAOverrides(ctaUrlOverrides);
// 2. Store globally (ctaReplacer.ts)
let currentCTAOverrides: Record<string, string> | null = null;
export function setCTAOverrides(overrides: Record<string, string> | null): void {
currentCTAOverrides = overrides;
ctaOverridesVersion++;
subscribers.forEach(callback => callback()); // Notify buttons
}
// 3. Apply in CTA resolution (ctaReplacer.ts:180)
export function getCTAData(domain, placeholder, language) {
const ctaId = cta?.cta_id;
let url = staticUrl;
if (ctaId && currentCTAOverrides && currentCTAOverrides[ctaId]) {
url = currentCTAOverrides[ctaId]; // Use dynamic URL
}
return { url, text, ctaId };
}
// 4. Re-render on changes (InlineCTAButton.tsx)
const overridesVersion = useCTAOverridesVersion(); // Subscribes to changes
const ctaData = useMemo(() => getCTAData(...), [overridesVersion]); // Re-computes
Data Flow¶
State Persistence¶
flowchart LR
Turn1[Turn 1: User asks question]
Turn2[Turn 2: Bot asks company size]
Turn3[Turn 3: User answers '5 people']
Extract[form_field_extractor]
State[(form_collection_state)]
CTA[CTA URL Override]
Turn1 --> Turn2
Turn2 --> Turn3
Turn3 --> Extract
Extract -->|company_size: '1_to_5'| State
State -->|Persist via Redis| Turn3
State --> CTA
Cross-Turn Accumulation¶
Form values accumulate across conversation turns:
| Turn | User Says | Extracted | Collected Values |
|---|---|---|---|
| 1 | "I need help with pricing" | - | {} |
| 2 | "We're a team of 5" | company_size: "1_to_5" |
{company_size: "1_to_5"} |
| 3 | "We're in fintech" | industry: "finance" |
{company_size: "1_to_5", industry: "finance"} |
Debugging¶
Backend Logging¶
Enable debug logging to see extraction process:
# In form_field_extractor.py
logger.debug(f"📋 [FORM EXTRACTOR] Starting form field extraction for {domain_id}, turn {turn_number}")
logger.debug(f"📋 [FORM EXTRACTOR] Unfilled fields: {[f.id for f in unfilled_fields]}")
logger.debug(f"📋 [FORM EXTRACTOR] Extracted values: {extracted_values}")
Frontend Logging¶
Check browser console for CTA override events:
// In ctaReplacer.ts
logger.debug('CTA URL overrides set:', overrides, 'version:', ctaOverridesVersion);
logger.debug('Using CTA URL override:', { ctaId, originalUrl, overrideUrl });
Common Issues¶
| Issue | Cause | Solution |
|---|---|---|
| CTA not updating | Low confidence extraction | Check LLM output, adjust extraction_prompt |
| Wrong value extracted | Ambiguous user input | Add more specific options, improve prompt |
| Duplicate extraction | Turn number not incremented | Check last_extraction_turn logic |
| Overrides not applied | CTA ID mismatch | Verify cta_id matches in config and content |
Testing Extraction¶
# Run with debug logging
MYPYPATH=packages/ixchat poetry run pytest \
packages/ixchat/ixchat/tests/test_form_field_extractor.py -v -s
Best Practices¶
Field Configuration¶
- Be specific with extraction prompts: "Number of employees" is better than "company size"
- Use clear option values: Use snake_case IDs (
1_to_5) not display text ("1-5 employees") - Limit options: LLM performs better with fewer, distinct options
- Add synonyms in prompts: "Number of employees, team size, or headcount"
URL Templates¶
- Use unique CTA IDs: Must match between
form_config.ctasand content placeholders - Handle missing values: Design forms to work with partial data
- URL encode values: The builder handles encoding automatically
Conversation Design¶
- Ask specific questions: "How many employees do you have?" extracts better than open-ended questions
- Offer suggested answers: Pre-defined answers map directly to options
- Confirm extractions: Show collected values to user for verification