Seeding the Supabase DB¶
There are two seed paths, both available against local Supabase and against preview branches:
| Path | Source | Speed | PII | Use when |
|---|---|---|---|---|
| Synthetic (default) | supabase/seed_synthetic/*.sql |
~1–3 s | none | Day-to-day dev, CI smoke tests, onboarding new devs without prod access |
| Copy (opt-in) | scripts/seed_local_db.py → production REST API |
~60 s | yes (real visitors, emails, chat content) | Reproducing a real client bug, validating against real-world data shapes |
Targets¶
just target |
DB target | Source |
|---|---|---|
just seed-local |
Local Supabase (port 54322) | Synthetic |
just seed-local-fresh |
Local Supabase | Synthetic, --clear first |
just seed-local-copy |
Local Supabase | Prod copy (PII) |
just seed-branch |
Linked preview branch | Synthetic + migrations push + seed.sql + MV refresh |
just seed-branch-copy |
Linked preview branch | Prod copy (PII) |
Run from supabase/:
What synthetic seeding does¶
just seed-branch is end-to-end idempotent and brings a fresh preview branch to a fully-usable state:
- Applies any missing migrations — Supabase's automated migration replay sometimes silently stops partway. This step diffs the branch's
supabase_migrations.schema_migrationsagainst local files and applies whatever's missing viapsql. Skips migrations already on the branch (handles the case where production has migrations newer than your worktree). -
Runs
supabase/seed.sql— creates the two local auth users:Email Password Role admin@admin.comadminAdmin (bypasses domain RLS) user@user.comuserRegular user (sees granted domains only) -
Loads
supabase/seed_synthetic/[0-9]*.sql— see layout below. - Refreshes materialized views —
mv_widget_visitors,mv_conversation_visitors,mv_client_stats_30d. Order matters:mv_client_stats_30ddepends on the other two.
just seed-local does the same minus the migration-push step (local Supabase manages its own migrations via supabase migration up).
Layout¶
Files run in numeric-prefix order:
supabase/seed_synthetic/
README.md
_truncate.sql # reverse-FK-order TRUNCATE used by --clear
00_clients_and_domains.sql # 4 fake clients + domains + identity/appearance configs
01_ai_sections.sql # 2 ai_section rows per domain
02_backoffice_user_domains.sql # grants user@user.com access to all 4 domains
10_accounts.sql # ~20 fake B2B accounts
11_ip_hash_to_account.sql # ~20 ip-hash mappings
20_visitors.sql # ~200 visitors (50 per domain)
21_conversations.sql # ~100 conversations with funnel flags
22_visitor_sessions.sql # ~150 sessions
23_messages.sql # ~500 messages (lorem-style)
24_message_evaluations.sql # ~50 evals
25_visitor_page_views.sql # ~400 page views
26_session_events.sql # ~330 events (widget_impression, widget_opened, cta_clicked)
Synthetic dataset shape¶
4 fake domains, each owned by its own client:
| Client | Domain | Default language | Brand color |
|---|---|---|---|
| Acme | acme.com |
en | #0A42C3 |
| Contoso | contoso.com |
en | #1E8E3E |
| Fabrikam | fabrikam.com |
fr | #D93025 |
| Demo | demo.local |
en | #9334E6 |
After a full seed + MV refresh, each domain shows roughly:
- 25 conversations under
last_message_at≤ 8h ago, allenvironment = 'production',deployment_target = 'widget'. - 31 widget visitors (after the 10s-duration filter in
mv_widget_visitors). - ~80% engagement rate.
- 20% conversion rate (5 converted persons per client, split between
demo_personsandemail_only_personsdepending on the domain).
That's enough to populate Home, Conversations, Visitors, Accounts, Analytics pages without empty states.
Adding a new table¶
- Pick a numeric prefix that respects FK order.
- Use deterministic UUIDs:
md5(i::text || '<your-tag>')::uuidso re-runs produce stable IDs. - End every INSERT with
ON CONFLICT … DO NOTHINGso re-runs are no-ops. - Add a matching
TRUNCATEline to_truncate.sql(reverse FK order). - Run
cd supabase && just seed-localorjust seed-branchto verify.
Example:
-- supabase/seed_synthetic/30_my_new_table.sql
INSERT INTO public.my_new_table (id, site_domain, payload, created_at)
SELECT
md5(i::text || ':mynewtable')::uuid,
(ARRAY['acme.com','contoso.com','fabrikam.com','demo.local'])[1 + ((i - 1) % 4)],
jsonb_build_object('synthetic', true, 'rank', i),
now() - ((i * 2) || ' minutes')::interval
FROM generate_series(1, 100) AS i
ON CONFLICT (id) DO NOTHING;
Schema drift¶
If a migration adds a NOT NULL column to a table that's already in seed_synthetic/, the next seed run will error out. Update the seed file in the same PR as the migration. The CI smoke test (scripts/check_synthetic_config_slugs.py and friends — TODO) will catch slug drift in config.configs; column drift is currently caught manually.
The *-copy paths (just seed-local-copy / just seed-branch-copy) are unaffected by drift — they copy whatever's in production.
When NOT to use the synthetic seed¶
The synthetic data has lorem-style messages, fake company names, and no LLM-classified topics / keywords. So:
- Reproducing an ixchat eval bug → use
*-copy(real conversations). - Reproducing a specific client's quirky data (e.g. Pennylane intent distributions) → use
*-copy. - Performance / load testing → write a dedicated bigger-volume seed; the 200-visitor default isn't enough.
Configs (config.configs)¶
The 16 config-slug rows are not part of the seed. They are populated by migrations (20260220150940_seed_config_definitions.sql + ~10 follow-ups), so they exist on every fresh branch automatically. JSONB defaults live in schemas/configs/*.schema.json and reach runtime through the generated SLUG_TO_DEFAULTS constant — don't duplicate them in synthetic SQL.
config.client_configs per-domain overrides come from 00_clients_and_domains.sql (identity + appearance, mirroring what create_domain() does internally).
See also¶
- Supabase Preview Branches — lifecycle, branch creation, troubleshooting.
supabase/seed_synthetic/README.md— quick reference inside the seed directory.docs/plans/seed-synthetic-data.md— original design doc.docs/plans/seed-pgdump-study.md— study of how to speed up the prod-copy path.