Form Field Extraction & Dynamic CTA URLs¶

Overview¶

The Form Field Extraction feature enables dynamic CTA (Call-To-Action) URLs based on information collected during the conversation. When a user provides answers to questions (e.g., company size, industry), the system extracts these values and injects them into CTA URLs, enabling personalized routing to the appropriate forms or landing pages.

Key Features¶

LLM-powered extraction: Uses AI to extract form field values from natural conversation
Real-time URL updates: CTA URLs update dynamically as information is collected
Flexible configuration: Define form fields and CTA templates per site in Supabase
Confidence thresholds: Only uses high-confidence extractions (≥70%)
Multi-form support: Supports multiple forms per site with different field configurations

Architecture¶

Sequence Diagram¶

sequenceDiagram participant User participant Frontend as React Frontend participant API as FastAPI Backend participant Graph as LangGraph participant AW as answer_writer participant FFE as form_field_extractor participant SR as suggestion_router participant Fin as finalize participant LLM as LLM (GPT-4) participant Supabase User->>Frontend: Sends message (typed or clicks suggestion) Frontend->>API: POST /api/lightrag/query/stream {query, siteName, sessionId} API->>Graph: Invoke graph with state Note over Graph: START node Graph->>AW: answer_writer generates response AW->>LLM: Generate answer LLM-->>AW: Response text par Parallel Execution AW->>FFE: form_field_extractor FFE->>Supabase: Get site form_config Supabase-->>FFE: Form schema + CTA templates FFE->>LLM: Extract field values from conversation LLM-->>FFE: {company_size: "1-5", confidence: 0.9} FFE->>FFE: Update form_collection_state FFE->>Fin: finalize and AW->>SR: suggestion_router SR->>Fin: finalize end Note over Fin: finalize node (response_node) Fin->>Fin: Add messages to conversation Graph-->>API: Final state with form_collection_state API->>API: compute_cta_url_overrides() Inject values into CTA templates API-->>Frontend: Stream complete event {cta_url_overrides: {cta_id: "url?size=1-5"}} Frontend->>Frontend: setCTAOverrides(overrides) Frontend->>Frontend: CTA buttons re-render with new URLs Frontend-->>User: Display response with updated CTAs

Component Overview¶

Configuration¶

Form Config Structure (Supabase `site_configs.form_config`)¶

{
  "forms": {
    "lead_qualification": {
      "id": "lead_qualification",
      "name": "Lead Qualification",
      "fields": [
        {
          "id": "company_size",
          "label": "Company Size",
          "type": "select",
          "extraction_prompt": "Number of employees in the company",
          "options": [
            {"value": "1_to_5", "label": "1-5 employees"},
            {"value": "6_to_20", "label": "6-20 employees"},
            {"value": "21_to_100", "label": "21-100 employees"},
            {"value": "100_plus", "label": "100+ employees"}
          ]
        },
        {
          "id": "industry",
          "label": "Industry",
          "type": "select",
          "extraction_prompt": "The industry or sector of the company",
          "options": [
            {"value": "tech", "label": "Technology"},
            {"value": "finance", "label": "Finance"},
            {"value": "healthcare", "label": "Healthcare"}
          ]
        }
      ],
      "ctas": [
        {
          "cta_id": "demo_cta",
          "url_template": "https://example.com/demo?size={{company_size}}&industry={{industry}}"
        },
        {
          "cta_id": "pricing_cta",
          "url_template": "https://example.com/pricing?tier={{company_size}}"
        }
      ]
    }
  }
}

Field Configuration Options¶

Property	Type	Description
`id`	string	Unique identifier for the field (used in URL templates)
`label`	string	Human-readable label
`type`	string	Field type: `select`, `text`, `number`
`extraction_prompt`	string	Hint for the LLM on what to extract
`options`	array	For select fields: valid values with labels

CTA Template Syntax¶

URL templates use {{field_id}} placeholders:

https://example.com/form?size={{company_size}}&source={{utm_source}}

Placeholders are replaced with extracted values
Unmatched placeholders remain as-is (or can be configured to be removed)

Backend Implementation¶

Key Files¶

File	Purpose
`backend/packages/ixchat/ixchat/nodes/form_field_extractor.py`	LLM-powered value extraction
`backend/packages/ixchat/ixchat/utils/cta_url_builder.py`	URL template processing
`backend/packages/ixchat/ixchat/pydantic_models/form_collection.py`	State models
`backend/apps/api/search/ixsearch_api/routes/chat.py`	API endpoint

Form Collection State¶

class FormCollectionState(BaseModel):
    form_id: str | None = None
    collected_values: dict[str, str] = {}  # {field_id: extracted_value}
    pending_fields: list[str] = []
    is_complete: bool = False
    last_extraction_turn: int = -1  # Prevents duplicate extraction

Extraction Process¶

Get form config from Supabase for the site
Identify unfilled fields (not yet in collected_values)
Build conversation context from last 10 messages + current input
Call LLM with structured output schema
Filter by confidence (≥0.7 threshold)
Update state with extracted values

LLM Extraction Prompt¶

The system uses a Langfuse-managed prompt (rose-internal-form-extraction) or falls back to:

Analyze this conversation and extract the following information:

- company_size: Number of employees in the company. Valid values: ["1_to_5", "6_to_20", "21_to_100", "100_plus"]
- industry: The industry or sector of the company. Valid values: ["tech", "finance", "healthcare"]

Conversation:
USER: I'm looking for a solution for my small team
ASSISTANT: Happy to help! How many people are on your team?
USER: We're about 5 people

For each field, return the extracted value or null if not found.

Frontend Implementation¶

Key Files¶

File	Purpose
`frontend/shared/src/hooks/chat/useStreamingMessage.ts`	Receives CTA overrides
`frontend/shared/src/utils/content/ctaReplacer.ts`	Stores and applies overrides
`frontend/shared/src/components/InlineCTAButton.tsx`	Renders CTA buttons

Override Flow¶

// 1. Receive in streaming hook (useStreamingMessage.ts:194)
const ctaUrlOverrides = chunk.metadata?.cta_url_overrides || null;
setCTAOverrides(ctaUrlOverrides);

// 2. Store globally (ctaReplacer.ts)
let currentCTAOverrides: Record<string, string> | null = null;
export function setCTAOverrides(overrides: Record<string, string> | null): void {
    currentCTAOverrides = overrides;
    ctaOverridesVersion++;
    subscribers.forEach(callback => callback());  // Notify buttons
}

// 3. Apply in CTA resolution (ctaReplacer.ts:180)
export function getCTAData(domain, placeholder, language) {
    const ctaId = cta?.cta_id;
    let url = staticUrl;

    if (ctaId && currentCTAOverrides && currentCTAOverrides[ctaId]) {
        url = currentCTAOverrides[ctaId];  // Use dynamic URL
    }

    return { url, text, ctaId };
}

// 4. Re-render on changes (InlineCTAButton.tsx)
const overridesVersion = useCTAOverridesVersion();  // Subscribes to changes
const ctaData = useMemo(() => getCTAData(...), [overridesVersion]);  // Re-computes

Data Flow¶

State Persistence¶

flowchart LR Turn1[Turn 1: User asks question] Turn2[Turn 2: Bot asks company size] Turn3[Turn 3: User answers '5 people'] Extract[form_field_extractor] State[(form_collection_state)] CTA[CTA URL Override] Turn1 --> Turn2 Turn2 --> Turn3 Turn3 --> Extract Extract -->|company_size: '1_to_5'| State State -->|Persist via Redis| Turn3 State --> CTA

Cross-Turn Accumulation¶

Form values accumulate across conversation turns:

Turn	User Says	Extracted	Collected Values
1	"I need help with pricing"	-	`{}`
2	"We're a team of 5"	`company_size: "1_to_5"`	`{company_size: "1_to_5"}`
3	"We're in fintech"	`industry: "finance"`	`{company_size: "1_to_5", industry: "finance"}`

Debugging¶

Backend Logging¶

Enable debug logging to see extraction process:

# In form_field_extractor.py
logger.debug(f"📋 [FORM EXTRACTOR] Starting form field extraction for {domain_id}, turn {turn_number}")
logger.debug(f"📋 [FORM EXTRACTOR] Unfilled fields: {[f.id for f in unfilled_fields]}")
logger.debug(f"📋 [FORM EXTRACTOR] Extracted values: {extracted_values}")

Frontend Logging¶

Check browser console for CTA override events:

// In ctaReplacer.ts
logger.debug('CTA URL overrides set:', overrides, 'version:', ctaOverridesVersion);
logger.debug('Using CTA URL override:', { ctaId, originalUrl, overrideUrl });

Common Issues¶

Issue	Cause	Solution
CTA not updating	Low confidence extraction	Check LLM output, adjust extraction_prompt
Wrong value extracted	Ambiguous user input	Add more specific options, improve prompt
Duplicate extraction	Turn number not incremented	Check `last_extraction_turn` logic
Overrides not applied	CTA ID mismatch	Verify `cta_id` matches in config and content

Testing Extraction¶

# Run with debug logging
MYPYPATH=packages/ixchat poetry run pytest \
    packages/ixchat/ixchat/tests/test_form_field_extractor.py -v -s

Best Practices¶

Field Configuration¶

Be specific with extraction prompts: "Number of employees" is better than "company size"
Use clear option values: Use snake_case IDs (1_to_5) not display text ("1-5 employees")
Limit options: LLM performs better with fewer, distinct options
Add synonyms in prompts: "Number of employees, team size, or headcount"

URL Templates¶

Use unique CTA IDs: Must match between form_config.ctas and content placeholders
Handle missing values: Design forms to work with partial data
URL encode values: The builder handles encoding automatically

Conversation Design¶

Ask specific questions: "How many employees do you have?" extracts better than open-ended questions
Offer suggested answers: Pre-defined answers map directly to options
Confirm extractions: Show collected values to user for verification

Form Field Extraction & Dynamic CTA URLs¶

Overview¶

Key Features¶

Architecture¶

Sequence Diagram¶

Component Overview¶

Configuration¶

Form Config Structure (Supabase site_configs.form_config)¶

Field Configuration Options¶

CTA Template Syntax¶

Backend Implementation¶

Key Files¶

Form Collection State¶

Extraction Process¶

LLM Extraction Prompt¶

Frontend Implementation¶

Key Files¶

Override Flow¶

Data Flow¶

State Persistence¶

Cross-Turn Accumulation¶

Debugging¶

Backend Logging¶

Frontend Logging¶

Common Issues¶

Testing Extraction¶

Best Practices¶

Field Configuration¶

URL Templates¶

Conversation Design¶

Related Documentation¶

Form Config Structure (Supabase `site_configs.form_config`)¶