Visitor Data Flow¶
This document describes how visitor data flows from the widget through the backend and into the Client Backoffice.
Data Sources Overview¶
Core Tables¶
visitors¶
The primary table for visitor identity and profile data.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key (visitor_id) |
email |
string | Email if captured during conversation |
account_id |
UUID | FK to accounts table (set via enrichment) |
first_seen_at |
timestamp | When visitor was first created |
enrichment_data |
JSON | Raw enrichment data from providers |
site_domain |
string | The site this visitor belongs to |
visitor_sessions¶
Tracks individual browsing sessions for analytics.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
session_id |
string | Browser session identifier |
site_domain |
string | Domain where session occurred |
conversation_id |
UUID | FK to conversations (if chat occurred) |
first_event_at |
timestamp | Start of session |
last_activity_at |
timestamp | Last activity in session |
conversations¶
Chat conversations between visitors and the AI agent.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
visitor_id |
UUID | FK to visitors |
session_id |
string | Langfuse session ID |
total_turns |
int | Number of message exchanges |
last_message_at |
timestamp | Time of last message |
email_captured |
bool | Whether email was collected |
demo_proposed |
bool | Whether demo was offered |
cta_clicked |
bool | Whether CTA was clicked |
demo_booked |
bool | Whether demo was booked |
accounts¶
Company-level aggregation for enriched visitors.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
company_name |
string | Company name from enrichment |
company_domain |
string | Company website domain |
enrichment_data |
JSON | Full enrichment payload |
Computed View: visitor_session_metrics¶
This PostgreSQL view aggregates session data per visitor for analytics display.
-- Simplified view definition
CREATE VIEW visitor_session_metrics AS
SELECT
v.id AS visitor_id,
v.person_id,
v.site_domain,
COUNT(DISTINCT vs.id) AS total_sessions,
SUM(page_views) AS total_page_views,
AVG(session_duration) AS avg_session_duration_seconds,
MAX(vs.last_activity_at) AS last_seen_at
FROM visitors v
LEFT JOIN visitor_sessions vs ON vs.site_domain = extract_root_domain(v.site_domain)
GROUP BY v.id, v.person_id, v.site_domain;
Domain Normalization¶
The extract_root_domain() function handles domain mismatches between tables:
visitor_sessions.site_domainmay containwww.example.comvisitors.site_domaintypically containsexample.com
The function strips common prefixes (www., app., etc.) to enable accurate joins.
RPC Function: get_visitor_stats_with_intent¶
The primary data source for the Visitors table. This RPC function efficiently aggregates:
- Visitor profile data
- Conversation metrics (count, messages, timestamps)
- Account association
- Behavior signals aggregated across conversations
- Session metrics including
last_seen_at - Computed intent score
Parameters¶
| Parameter | Type | Description |
|---|---|---|
p_domain |
string | Required - Site domain to query |
p_date_start |
timestamp | Filter by date range start |
p_date_end |
timestamp | Filter by date range end |
p_environment |
string | Filter by environment (production/staging) |
p_account_id |
UUID | Filter by specific account |
p_min_messages |
int | Minimum message count (default: 1) |
Return Fields¶
| Field | Source | Description |
|---|---|---|
visitor_id |
visitors | Visitor identifier |
email |
visitors | Captured email |
account_id |
visitors | Associated account |
account_name |
accounts | Company name |
company_domain |
accounts | Company website |
conversation_count |
conversations | Number of conversations |
total_messages |
conversations | Sum of all messages |
last_message_at |
conversations | Most recent message time |
last_seen_at |
visitor_session_metrics | Most recent session activity |
first_seen_at |
visitors | First visitor creation time |
email_asked |
conversations | Any conversation asked for email |
email_captured |
conversations | Any conversation captured email |
demo_proposed |
conversations | Any conversation proposed demo |
cta_clicked |
conversations | Any conversation had CTA click |
demo_booked |
conversations | Any conversation booked demo |
intent_score |
computed | Behavior (60%) + interest (40%) |
Data Flow in the UI¶
VisitorsTable Component¶
┌──────────────────────────────────────────────────────────────┐
│ VisitorsTable │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ useVisitors hook │ │
│ │ └─► fetchVisitors() │ │
│ │ └─► supabase.rpc('get_visitor_stats_with_intent') │ │
│ │ └─► Returns: VisitorSummary[] │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Session Metrics (optional columns) │ │
│ │ └─► useMultipleVisitorSessionMetricsQuery() │ │
│ │ └─► supabase.from('visitor_session_metrics') │ │
│ │ └─► Returns: sessions, pageViews, duration │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Display Columns │ │
│ │ • Visitor (email or truncated ID) │ │
│ │ • Account (company name, clickable) │ │
│ │ • Messages (total_messages) │ │
│ │ • Last Activity (last_seen_at ?? last_message_at) │ │
│ │ • Intent (computed score with badge) │ │
│ │ • Funnel Stage (computed from behavior flags) │ │
│ │ • Sessions, Avg Duration, Page Views (from metrics) │ │
│ └─────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Last Activity Column¶
The "Last Activity" column shows when a visitor was last active:
- Primary:
last_seen_atfrom the RPC (derived from session metrics) - Fallback:
last_message_atfrom conversations - Fallback:
first_seen_atfrom visitor creation
This ensures visitors who browsed but didn't chat still show recent activity.
Sorting¶
| Sort Option | Field Used |
|---|---|
lastActivity |
last_seen_at (falls back to last_message_at) |
messages |
total_messages |
name |
account_name |
intent |
intent_score |
Enrichment Flow¶
When a visitor is enriched (via IP lookup or email), the flow is:
Enrichment data is stored in both:
visitors.enrichment_data- Raw enrichment responseaccountstable - Normalized company information
Session Tracking¶
Sessions are tracked via PostHog webhooks processed by a Cloudflare Worker (cloudflare-posthog-webhook.js):
Event Types Handled¶
| PostHog Event | Cloudflare Worker Action |
|---|---|
rw_widget_impression |
Creates visitor_sessions row |
$pageleave |
Updates last_activity_at |
rw_message_sent |
Logs event, updates session duration |
rw_email_captured |
Logs email capture event |
rw_cta_clicked |
Updates conversations.cta_clicked |
rw_cta_page_display |
Updates conversations.cta_page_display |
rw_client_form_submitted |
Updates conversations.demo_booked |
The visitor_session_metrics view then aggregates these sessions per visitor.
Key Implementation Notes¶
Efficient Querying¶
The RPC function get_visitor_stats_with_intent was created to avoid:
- N+1 queries for visitor stats
- URL length issues with large
.in()queries - Client-side aggregation overhead
Domain Normalization¶
The extract_root_domain() PostgreSQL function ensures accurate joins even when:
- Sessions have
www.prefix - Visitors have root domain
- Different subdomains exist
Intent Score Calculation¶
Intent score combines:
- Behavior signals (60%): email_captured, demo_proposed, cta_clicked, demo_booked
- Interest score (40%): From conversation content analysis
This is computed in the RPC for efficient server-side calculation.