Skip to content

Seeding the Supabase DB

There are two seed paths, both available against local Supabase and against preview branches:

Path Source Speed PII Use when
Synthetic (default) supabase/seed_synthetic/*.sql ~1–3 s none Day-to-day dev, CI smoke tests, onboarding new devs without prod access
Copy (opt-in) scripts/seed_local_db.py → production REST API ~60 s yes (real visitors, emails, chat content) Reproducing a real client bug, validating against real-world data shapes

Targets

just target DB target Source
just seed-local Local Supabase (port 54322) Synthetic
just seed-local-fresh Local Supabase Synthetic, --clear first
just seed-local-copy Local Supabase Prod copy (PII)
just seed-branch Linked preview branch Synthetic + migrations push + seed.sql + MV refresh
just seed-branch-copy Linked preview branch Prod copy (PII)

Run from supabase/:

cd supabase
just seed-local         # local
just seed-branch        # preview branch

What synthetic seeding does

just seed-branch is end-to-end idempotent and brings a fresh preview branch to a fully-usable state:

  1. Applies any missing migrations — Supabase's automated migration replay sometimes silently stops partway. This step diffs the branch's supabase_migrations.schema_migrations against local files and applies whatever's missing via psql. Skips migrations already on the branch (handles the case where production has migrations newer than your worktree).
  2. Runs supabase/seed.sql — creates the two local auth users:

    Email Password Role
    admin@admin.com admin Admin (bypasses domain RLS)
    user@user.com user Regular user (sees granted domains only)
  3. Loads supabase/seed_synthetic/[0-9]*.sql — see layout below.

  4. Refreshes materialized viewsmv_widget_visitors, mv_conversation_visitors, mv_client_stats_30d. Order matters: mv_client_stats_30d depends on the other two.

just seed-local does the same minus the migration-push step (local Supabase manages its own migrations via supabase migration up).

Layout

Files run in numeric-prefix order:

supabase/seed_synthetic/
  README.md
  _truncate.sql                    # reverse-FK-order TRUNCATE used by --clear
  00_clients_and_domains.sql       # 4 fake clients + domains + identity/appearance configs
  01_ai_sections.sql               # 2 ai_section rows per domain
  02_backoffice_user_domains.sql   # grants user@user.com access to all 4 domains
  10_accounts.sql                  # ~20 fake B2B accounts
  11_ip_hash_to_account.sql        # ~20 ip-hash mappings
  20_visitors.sql                  # ~200 visitors (50 per domain)
  21_conversations.sql             # ~100 conversations with funnel flags
  22_visitor_sessions.sql          # ~150 sessions
  23_messages.sql                  # ~500 messages (lorem-style)
  24_message_evaluations.sql       # ~50 evals
  25_visitor_page_views.sql        # ~400 page views
  26_session_events.sql            # ~330 events (widget_impression, widget_opened, cta_clicked)

Synthetic dataset shape

4 fake domains, each owned by its own client:

Client Domain Default language Brand color
Acme acme.com en #0A42C3
Contoso contoso.com en #1E8E3E
Fabrikam fabrikam.com fr #D93025
Demo demo.local en #9334E6

After a full seed + MV refresh, each domain shows roughly:

  • 25 conversations under last_message_at ≤ 8h ago, all environment = 'production', deployment_target = 'widget'.
  • 31 widget visitors (after the 10s-duration filter in mv_widget_visitors).
  • ~80% engagement rate.
  • 20% conversion rate (5 converted persons per client, split between demo_persons and email_only_persons depending on the domain).

That's enough to populate Home, Conversations, Visitors, Accounts, Analytics pages without empty states.

Adding a new table

  1. Pick a numeric prefix that respects FK order.
  2. Use deterministic UUIDs: md5(i::text || '<your-tag>')::uuid so re-runs produce stable IDs.
  3. End every INSERT with ON CONFLICT … DO NOTHING so re-runs are no-ops.
  4. Add a matching TRUNCATE line to _truncate.sql (reverse FK order).
  5. Run cd supabase && just seed-local or just seed-branch to verify.

Example:

-- supabase/seed_synthetic/30_my_new_table.sql
INSERT INTO public.my_new_table (id, site_domain, payload, created_at)
SELECT
    md5(i::text || ':mynewtable')::uuid,
    (ARRAY['acme.com','contoso.com','fabrikam.com','demo.local'])[1 + ((i - 1) % 4)],
    jsonb_build_object('synthetic', true, 'rank', i),
    now() - ((i * 2) || ' minutes')::interval
FROM generate_series(1, 100) AS i
ON CONFLICT (id) DO NOTHING;

Schema drift

If a migration adds a NOT NULL column to a table that's already in seed_synthetic/, the next seed run will error out. Update the seed file in the same PR as the migration. The CI smoke test (scripts/check_synthetic_config_slugs.py and friends — TODO) will catch slug drift in config.configs; column drift is currently caught manually.

The *-copy paths (just seed-local-copy / just seed-branch-copy) are unaffected by drift — they copy whatever's in production.

When NOT to use the synthetic seed

The synthetic data has lorem-style messages, fake company names, and no LLM-classified topics / keywords. So:

  • Reproducing an ixchat eval bug → use *-copy (real conversations).
  • Reproducing a specific client's quirky data (e.g. Pennylane intent distributions) → use *-copy.
  • Performance / load testing → write a dedicated bigger-volume seed; the 200-visitor default isn't enough.

Configs (config.configs)

The 16 config-slug rows are not part of the seed. They are populated by migrations (20260220150940_seed_config_definitions.sql + ~10 follow-ups), so they exist on every fresh branch automatically. JSONB defaults live in schemas/configs/*.schema.json and reach runtime through the generated SLUG_TO_DEFAULTS constant — don't duplicate them in synthetic SQL.

config.client_configs per-domain overrides come from 00_clients_and_domains.sql (identity + appearance, mirroring what create_domain() does internally).

See also

  • Supabase Preview Branches — lifecycle, branch creation, troubleshooting.
  • supabase/seed_synthetic/README.md — quick reference inside the seed directory.
  • docs/plans/seed-synthetic-data.md — original design doc.
  • docs/plans/seed-pgdump-study.md — study of how to speed up the prod-copy path.