Skip to content

Form Field Extraction & Dynamic CTA URLs

Overview

The Form Field Extraction feature enables dynamic CTA (Call-To-Action) URLs based on information collected during the conversation. When a user provides answers to questions (e.g., company size, industry), the system extracts these values and injects them into CTA URLs, enabling personalized routing to the appropriate forms or landing pages.

Key Features

  • LLM-powered extraction: Uses AI to extract form field values from natural conversation
  • Real-time URL updates: CTA URLs update dynamically as information is collected
  • Flexible configuration: Define form fields and CTA templates per site in Supabase
  • Confidence thresholds: Only uses high-confidence extractions (≥70%)
  • Multi-form support: Supports multiple forms per site with different field configurations

Architecture

Sequence Diagram

sequenceDiagram participant User participant Frontend as React Frontend participant API as FastAPI Backend participant Graph as LangGraph participant AW as answer_writer participant FFE as form_field_extractor participant SR as suggestion_router participant Fin as finalize participant LLM as LLM (GPT-4) participant Supabase User->>Frontend: Sends message<br/>(typed or clicks suggestion) Frontend->>API: POST /api/lightrag/query/stream<br/>{query, siteName, sessionId} API->>Graph: Invoke graph with state Note over Graph: START node Graph->>AW: answer_writer generates response AW->>LLM: Generate answer LLM-->>AW: Response text par Parallel Execution AW->>FFE: form_field_extractor FFE->>Supabase: Get site form_config Supabase-->>FFE: Form schema + CTA templates FFE->>LLM: Extract field values from conversation LLM-->>FFE: {company_size: "1-5", confidence: 0.9} FFE->>FFE: Update form_collection_state FFE->>Fin: finalize and AW->>SR: suggestion_router SR->>Fin: finalize end Note over Fin: finalize node (response_node) Fin->>Fin: Add messages to conversation Graph-->>API: Final state with form_collection_state API->>API: compute_cta_url_overrides()<br/>Inject values into CTA templates API-->>Frontend: Stream complete event<br/>{cta_url_overrides: {cta_id: "url?size=1-5"}} Frontend->>Frontend: setCTAOverrides(overrides) Frontend->>Frontend: CTA buttons re-render<br/>with new URLs Frontend-->>User: Display response with updated CTAs

Component Overview

flowchart TB subgraph Frontend UI[React Components] Hook[useStreamingMessage] CTA[ctaReplacer.ts] BTN[InlineCTAButton] end subgraph Backend API[FastAPI Route] Graph[LangGraph] FFE[form_field_extractor_node] Builder[cta_url_builder.py] end subgraph Storage Supabase[(Supabase)] Redis[(Redis Memory)] end UI -->|Send Message| Hook Hook -->|API Call| API API -->|Invoke| Graph Graph -->|Extract| FFE FFE -->|Get Config| Supabase FFE -->|Store State| Redis Graph -->|Final State| API API -->|Compute URLs| Builder Builder -->|Read Config| Supabase API -->|Stream Response| Hook Hook -->|Set Overrides| CTA CTA -->|Notify| BTN BTN -->|Re-render| UI

Configuration

Form Config Structure (Supabase site_configs.form_config)

{
  "forms": {
    "lead_qualification": {
      "id": "lead_qualification",
      "name": "Lead Qualification",
      "fields": [
        {
          "id": "company_size",
          "label": "Company Size",
          "type": "select",
          "extraction_prompt": "Number of employees in the company",
          "options": [
            {"value": "1_to_5", "label": "1-5 employees"},
            {"value": "6_to_20", "label": "6-20 employees"},
            {"value": "21_to_100", "label": "21-100 employees"},
            {"value": "100_plus", "label": "100+ employees"}
          ]
        },
        {
          "id": "industry",
          "label": "Industry",
          "type": "select",
          "extraction_prompt": "The industry or sector of the company",
          "options": [
            {"value": "tech", "label": "Technology"},
            {"value": "finance", "label": "Finance"},
            {"value": "healthcare", "label": "Healthcare"}
          ]
        }
      ],
      "ctas": [
        {
          "cta_id": "demo_cta",
          "url_template": "https://example.com/demo?size={{company_size}}&industry={{industry}}"
        },
        {
          "cta_id": "pricing_cta",
          "url_template": "https://example.com/pricing?tier={{company_size}}"
        }
      ]
    }
  }
}

Field Configuration Options

Property Type Description
id string Unique identifier for the field (used in URL templates)
label string Human-readable label
type string Field type: select, text, number
extraction_prompt string Hint for the LLM on what to extract
options array For select fields: valid values with labels

CTA Template Syntax

URL templates use {{field_id}} placeholders:

https://example.com/form?size={{company_size}}&source={{utm_source}}
  • Placeholders are replaced with extracted values
  • Unmatched placeholders remain as-is (or can be configured to be removed)

Backend Implementation

Key Files

File Purpose
backend/packages/ixchat/ixchat/nodes/form_field_extractor.py LLM-powered value extraction
backend/packages/ixchat/ixchat/utils/cta_url_builder.py URL template processing
backend/packages/ixchat/ixchat/pydantic_models/form_collection.py State models
backend/apps/api/search/ixsearch_api/routes/chat.py API endpoint

Form Collection State

class FormCollectionState(BaseModel):
    form_id: str | None = None
    collected_values: dict[str, str] = {}  # {field_id: extracted_value}
    pending_fields: list[str] = []
    is_complete: bool = False
    last_extraction_turn: int = -1  # Prevents duplicate extraction

Extraction Process

  1. Get form config from Supabase for the site
  2. Identify unfilled fields (not yet in collected_values)
  3. Build conversation context from last 10 messages + current input
  4. Call LLM with structured output schema
  5. Filter by confidence (≥0.7 threshold)
  6. Update state with extracted values

LLM Extraction Prompt

The system uses a Langfuse-managed prompt (rose-internal-form-extraction) or falls back to:

Analyze this conversation and extract the following information:

- company_size: Number of employees in the company. Valid values: ["1_to_5", "6_to_20", "21_to_100", "100_plus"]
- industry: The industry or sector of the company. Valid values: ["tech", "finance", "healthcare"]

Conversation:
USER: I'm looking for a solution for my small team
ASSISTANT: Happy to help! How many people are on your team?
USER: We're about 5 people

For each field, return the extracted value or null if not found.

Frontend Implementation

Key Files

File Purpose
frontend/shared/src/hooks/chat/useStreamingMessage.ts Receives CTA overrides
frontend/shared/src/utils/content/ctaReplacer.ts Stores and applies overrides
frontend/shared/src/components/InlineCTAButton.tsx Renders CTA buttons

Override Flow

// 1. Receive in streaming hook (useStreamingMessage.ts:194)
const ctaUrlOverrides = chunk.metadata?.cta_url_overrides || null;
setCTAOverrides(ctaUrlOverrides);

// 2. Store globally (ctaReplacer.ts)
let currentCTAOverrides: Record<string, string> | null = null;
export function setCTAOverrides(overrides: Record<string, string> | null): void {
    currentCTAOverrides = overrides;
    ctaOverridesVersion++;
    subscribers.forEach(callback => callback());  // Notify buttons
}

// 3. Apply in CTA resolution (ctaReplacer.ts:180)
export function getCTAData(domain, placeholder, language) {
    const ctaId = cta?.cta_id;
    let url = staticUrl;

    if (ctaId && currentCTAOverrides && currentCTAOverrides[ctaId]) {
        url = currentCTAOverrides[ctaId];  // Use dynamic URL
    }

    return { url, text, ctaId };
}

// 4. Re-render on changes (InlineCTAButton.tsx)
const overridesVersion = useCTAOverridesVersion();  // Subscribes to changes
const ctaData = useMemo(() => getCTAData(...), [overridesVersion]);  // Re-computes

Data Flow

State Persistence

flowchart LR Turn1[Turn 1: User asks question] Turn2[Turn 2: Bot asks company size] Turn3[Turn 3: User answers '5 people'] Extract[form_field_extractor] State[(form_collection_state)] CTA[CTA URL Override] Turn1 --> Turn2 Turn2 --> Turn3 Turn3 --> Extract Extract -->|company_size: '1_to_5'| State State -->|Persist via Redis| Turn3 State --> CTA

Cross-Turn Accumulation

Form values accumulate across conversation turns:

Turn User Says Extracted Collected Values
1 "I need help with pricing" - {}
2 "We're a team of 5" company_size: "1_to_5" {company_size: "1_to_5"}
3 "We're in fintech" industry: "finance" {company_size: "1_to_5", industry: "finance"}

Debugging

Backend Logging

Enable debug logging to see extraction process:

# In form_field_extractor.py
logger.debug(f"📋 [FORM EXTRACTOR] Starting form field extraction for {domain_id}, turn {turn_number}")
logger.debug(f"📋 [FORM EXTRACTOR] Unfilled fields: {[f.id for f in unfilled_fields]}")
logger.debug(f"📋 [FORM EXTRACTOR] Extracted values: {extracted_values}")

Frontend Logging

Check browser console for CTA override events:

// In ctaReplacer.ts
logger.debug('CTA URL overrides set:', overrides, 'version:', ctaOverridesVersion);
logger.debug('Using CTA URL override:', { ctaId, originalUrl, overrideUrl });

Common Issues

Issue Cause Solution
CTA not updating Low confidence extraction Check LLM output, adjust extraction_prompt
Wrong value extracted Ambiguous user input Add more specific options, improve prompt
Duplicate extraction Turn number not incremented Check last_extraction_turn logic
Overrides not applied CTA ID mismatch Verify cta_id matches in config and content

Testing Extraction

# Run with debug logging
MYPYPATH=packages/ixchat poetry run pytest \
    packages/ixchat/ixchat/tests/test_form_field_extractor.py -v -s

Best Practices

Field Configuration

  1. Be specific with extraction prompts: "Number of employees" is better than "company size"
  2. Use clear option values: Use snake_case IDs (1_to_5) not display text ("1-5 employees")
  3. Limit options: LLM performs better with fewer, distinct options
  4. Add synonyms in prompts: "Number of employees, team size, or headcount"

URL Templates

  1. Use unique CTA IDs: Must match between form_config.ctas and content placeholders
  2. Handle missing values: Design forms to work with partial data
  3. URL encode values: The builder handles encoding automatically

Conversation Design

  1. Ask specific questions: "How many employees do you have?" extracts better than open-ended questions
  2. Offer suggested answers: Pre-defined answers map directly to options
  3. Confirm extractions: Show collected values to user for verification