Skip to content

Tracking-Health Detector

A daily watchdog that flags clients whose Rose widget or form-tracking is broken. Surfaced in the backoffice as a colored badge next to the client name on the Clients page, and as a "needs attention" filter.

What it detects

Widget detection in the Rose widget is brittle — multiple strategies (DataLayer, CTAPageForm, ThankYouPage, HubSpot, NetworkObserver, Superform, PostMessage) break silently when a client changes their GTM container, swaps form provider, restructures CTA pages, or tightens cookie consent. The detector watches per-client signals from mv_client_stats_30d and session_events and emits a badge when something looks wrong.

Decision tree

The detector walks each enabled client (client_type IN ('client','onboarding','test')) top-down through this tree. Each leaf is exactly one outcome — no client gets two badges.

flowchart TD Start[Client active + enabled]:::start --> Q1{"widget_visitors over 30d >= 20?"} Q1 -- "no" --> Q2{"prior 23d widget >= 5<br/>AND last 7d widget = 0?"} Q2 -- "yes" --> WS["Widget stopped<br/><b>widget_stopped</b><br/>RED"]:::red Q2 -- "no" --> WNIP["Widget not in prod<br/><b>widget_not_in_prod</b><br/>amber"]:::amber Q1 -- "yes" --> Q3{"chats over 30d >= 10?"} Q3 -- "no" --> WNU["Widget in prod but unused<br/><b>widget_not_used</b><br/>amber"]:::amber Q3 -- "yes" --> Q4{"forms over 30d >= 1?"} Q4 -- "no" --> FTL["Form tracking lost<br/><b>form_tracking_lost</b><br/>RED"]:::red Q4 -- "yes" --> OK["HEALTHY<br/>no alert"]:::ok classDef start fill:#eef,stroke:#446,color:#222 classDef red fill:#fee2e2,stroke:#b91c1c,color:#7f1d1d classDef amber fill:#fef3c7,stroke:#b45309,color:#78350f classDef ok fill:#dcfce7,stroke:#15803d,color:#14532d

Outcomes

Outcome Color Meaning What to do
Widget not in prod (widget_not_in_prod) 🟠 amber Widget barely fires (< 20 distinct visitors over 30d) and there's no prior widget activity to compare to. Either never installed, blocked by a CMP, or the loader is broken. Check the client's site for the Rose loader snippet. Verify the loader URL is reachable. Check CMP / cookie-consent gating.
Widget stopped (widget_stopped) 🔴 red Widget was firing (>= 5 distinct visitors in the prior 23-day window) and is now silent (0 in the last 7 days). Real regression. Check the site — did the client redeploy? GTM container change? Loader URL break?
Widget in prod but unused (widget_not_used) 🟠 amber Widget loads at scale (>= 20 visitors / 30d) but fewer than 10 chats started in 30 days. Could be a content/copy issue, chat-init broken, wrong widget placement, or bot traffic. Open the site in an incognito browser, verify the widget actually opens and responds. Check copy / trigger config.
Form tracking lost (form_tracking_lost) 🔴 red Real chats happened (>= 10 over 30d) but zero client_form_submitted events were detected. Form-detection strategies broken on the client's site. Inspect form_tracking_pages config. Check whether GTM container changed, form provider was swapped, or detection strategies were disabled.
Healthy Widget loads, chats happen, forms are tracked. No badge.

Inputs

Symbol Source Definition
widget_visitors_30d mv_client_stats_30d.widget_visitors Distinct person_id who saw a widget_impression event on any of the client's domains in the last 30d, widget_displayed_at > 0, duration_s > 10
prior 23d widget session_events aggregate Distinct person_id with widget_impression over days -30 to -7
last 7d widget session_events aggregate Distinct person_id with widget_impression over days -7 to 0
chats_30d mv_client_stats_30d.conversation_visitors Distinct visitors who started a chat in the last 30d
forms_30d mv_client_stats_30d.form_submits_total Total client_form_submitted events over 30d across the client's root_domain

Demo + former + disabled clients are excluded from detection.

Why these thresholds

  • 20 widget_visitors / 30d: real production tier starts at ~100 visitors. Below 20 is dev / preprod / staff loads — not meaningfully in production. The boundary was picked from the actual visitor distribution (next tier up was ~22 / 97 visitors).
  • prior 23d ≥ 5 + last 7d = 0 for widget_stopped: splits baseline window from recent window so the check isn't self-contradicting. False-positive probability at the floor (P(zero in 7d | prior rate 5/23) ≈ 22%) — noisy but catches low-volume regressions worth investigating.
  • chats ≥ 10 for form_tracking_lost: at a 10% chat-to-form conversion rate, expected forms from 10 chats is 1.0 → P(zero) = e^(-1) ≈ 37%. Higher than ideal but tight enough to filter out 1-or-2-chat noise while still catching small-onboarding regressions.

How it runs

  • pg_cron job detect-tracking-health rebuilds public.tracking_health_alerts daily at 03:00 UTC, after mv_client_stats_30d refreshes at 02:00 UTC.
  • Truncate-and-insert: contents always reflect today's verdict. No history table yet.
  • Manual rerun: SELECT public.detect_tracking_health(); (requires service_role or postgres).

Where to look in code

  • supabase/migrations/20260522131345_add_tracking_health_alerts.sql — alert-table schema
  • supabase/migrations/20260522131402_add_tracking_health_cron.sql — detection function + cron schedule
  • supabase/migrations/20260522131510_extend_get_client_stats_30d_with_tracking_health.sql — RPC extension exposing tracking_health_signals
  • frontend/client-backoffice/src/components/clients/clientColumns.tsx — badge rendering
  • frontend/client-backoffice/src/lib/clientsAttention.ts — needs-attention integration

See also