Skip to content

Visitor Data Flow

This document describes how visitor data flows from the widget through the backend and into the Client Backoffice.

Data Sources Overview

graph TD subgraph "Data Collection" W[Widget] -->|All Events| PH[PostHog] PH -->|Webhook| CF[Cloudflare Worker] CF -->|Insert/Update| SB[Supabase] end subgraph "Storage Layer" SB --> V[visitors] SB --> VS[visitor_sessions] SB --> C[conversations] SB --> M[messages] SB --> A[accounts] end subgraph "Computed Views" VS --> VSM[visitor_session_metrics] V --> VSM end subgraph "RPC Functions" V --> GVS[get_visitor_stats_with_intent] C --> GVS VSM --> GVS A --> GVS end subgraph "Backoffice UI" GVS --> VT[VisitorsTable] VSM --> VT end

Core Tables

visitors

The primary table for visitor identity and profile data.

Column Type Description
id UUID Primary key (visitor_id)
email string Email if captured during conversation
account_id UUID FK to accounts table (set via enrichment)
first_seen_at timestamp When visitor was first created
enrichment_data JSON Raw enrichment data from providers
site_domain string The site this visitor belongs to

visitor_sessions

Tracks individual browsing sessions for analytics.

Column Type Description
id UUID Primary key
session_id string Browser session identifier
site_domain string Domain where session occurred
conversation_id UUID FK to conversations (if chat occurred)
first_event_at timestamp Start of session
last_activity_at timestamp Last activity in session

conversations

Chat conversations between visitors and the AI agent.

Column Type Description
id UUID Primary key
visitor_id UUID FK to visitors
session_id string Langfuse session ID
total_turns int Number of message exchanges
last_message_at timestamp Time of last message
email_captured bool Whether email was collected
demo_proposed bool Whether demo was offered
cta_clicked bool Whether CTA was clicked
demo_booked bool Whether demo was booked

accounts

Company-level aggregation for enriched visitors.

Column Type Description
id UUID Primary key
company_name string Company name from enrichment
company_domain string Company website domain
enrichment_data JSON Full enrichment payload

Computed View: visitor_session_metrics

This PostgreSQL view aggregates session data per visitor for analytics display.

-- Simplified view definition
CREATE VIEW visitor_session_metrics AS
SELECT
    v.id AS visitor_id,
    v.person_id,
    v.site_domain,
    COUNT(DISTINCT vs.id) AS total_sessions,
    SUM(page_views) AS total_page_views,
    AVG(session_duration) AS avg_session_duration_seconds,
    MAX(vs.last_activity_at) AS last_seen_at
FROM visitors v
LEFT JOIN visitor_sessions vs ON vs.site_domain = extract_root_domain(v.site_domain)
GROUP BY v.id, v.person_id, v.site_domain;

Domain Normalization

The extract_root_domain() function handles domain mismatches between tables:

  • visitor_sessions.site_domain may contain www.example.com
  • visitors.site_domain typically contains example.com

The function strips common prefixes (www., app., etc.) to enable accurate joins.

RPC Function: get_visitor_stats_with_intent

The primary data source for the Visitors table. This RPC function efficiently aggregates:

  • Visitor profile data
  • Conversation metrics (count, messages, timestamps)
  • Account association
  • Behavior signals aggregated across conversations
  • Session metrics including last_seen_at
  • Computed intent score

Parameters

Parameter Type Description
p_domain string Required - Site domain to query
p_date_start timestamp Filter by date range start
p_date_end timestamp Filter by date range end
p_environment string Filter by environment (production/staging)
p_account_id UUID Filter by specific account
p_min_messages int Minimum message count (default: 1)

Return Fields

Field Source Description
visitor_id visitors Visitor identifier
email visitors Captured email
account_id visitors Associated account
account_name accounts Company name
company_domain accounts Company website
conversation_count conversations Number of conversations
total_messages conversations Sum of all messages
last_message_at conversations Most recent message time
last_seen_at visitor_session_metrics Most recent session activity
first_seen_at visitors First visitor creation time
email_asked conversations Any conversation asked for email
email_captured conversations Any conversation captured email
demo_proposed conversations Any conversation proposed demo
cta_clicked conversations Any conversation had CTA click
demo_booked conversations Any conversation booked demo
intent_score computed Behavior (60%) + interest (40%)

Data Flow in the UI

VisitorsTable Component

┌──────────────────────────────────────────────────────────────┐
│                      VisitorsTable                            │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ useVisitors hook                                         │ │
│  │  └─► fetchVisitors()                                     │ │
│  │       └─► supabase.rpc('get_visitor_stats_with_intent')  │ │
│  │            └─► Returns: VisitorSummary[]                 │ │
│  └─────────────────────────────────────────────────────────┘ │
│                           │                                   │
│                           ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Session Metrics (optional columns)                       │ │
│  │  └─► useMultipleVisitorSessionMetricsQuery()             │ │
│  │       └─► supabase.from('visitor_session_metrics')       │ │
│  │            └─► Returns: sessions, pageViews, duration    │ │
│  └─────────────────────────────────────────────────────────┘ │
│                           │                                   │
│                           ▼                                   │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Display Columns                                          │ │
│  │  • Visitor (email or truncated ID)                       │ │
│  │  • Account (company name, clickable)                     │ │
│  │  • Messages (total_messages)                             │ │
│  │  • Last Activity (last_seen_at ?? last_message_at)       │ │
│  │  • Intent (computed score with badge)                    │ │
│  │  • Funnel Stage (computed from behavior flags)           │ │
│  │  • Sessions, Avg Duration, Page Views (from metrics)     │ │
│  └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘

Last Activity Column

The "Last Activity" column shows when a visitor was last active:

  1. Primary: last_seen_at from the RPC (derived from session metrics)
  2. Fallback: last_message_at from conversations
  3. Fallback: first_seen_at from visitor creation

This ensures visitors who browsed but didn't chat still show recent activity.

Sorting

Sort Option Field Used
lastActivity last_seen_at (falls back to last_message_at)
messages total_messages
name account_name
intent intent_score

Enrichment Flow

When a visitor is enriched (via IP lookup or email), the flow is:

sequenceDiagram participant W as Widget participant API as Search API participant E as Enrichment Service participant DB as Supabase W->>API: Chat message with IP API->>E: Lookup company by IP E-->>API: Company data API->>DB: Upsert account API->>DB: Update visitor.account_id Note over DB: Visitor now linked to account

Enrichment data is stored in both:

  • visitors.enrichment_data - Raw enrichment response
  • accounts table - Normalized company information

Session Tracking

Sessions are tracked via PostHog webhooks processed by a Cloudflare Worker (cloudflare-posthog-webhook.js):

sequenceDiagram participant W as Widget participant PH as PostHog participant CF as Cloudflare Worker participant DB as Supabase W->>PH: Send event (rw_widget_impression, etc.) PH->>CF: Webhook with event data CF->>DB: Insert/update visitor_sessions

Event Types Handled

PostHog Event Cloudflare Worker Action
rw_widget_impression Creates visitor_sessions row
$pageleave Updates last_activity_at
rw_message_sent Logs event, updates session duration
rw_email_captured Logs email capture event
rw_cta_clicked Updates conversations.cta_clicked
rw_cta_page_display Updates conversations.cta_page_display
rw_client_form_submitted Updates conversations.demo_booked

The visitor_session_metrics view then aggregates these sessions per visitor.

Key Implementation Notes

Efficient Querying

The RPC function get_visitor_stats_with_intent was created to avoid:

  • N+1 queries for visitor stats
  • URL length issues with large .in() queries
  • Client-side aggregation overhead

Domain Normalization

The extract_root_domain() PostgreSQL function ensures accurate joins even when:

  • Sessions have www. prefix
  • Visitors have root domain
  • Different subdomains exist

Intent Score Calculation

Intent score combines:

  • Behavior signals (60%): email_captured, demo_proposed, cta_clicked, demo_booked
  • Interest score (40%): From conversation content analysis

This is computed in the RPC for efficient server-side calculation.