Skip to content

ADR: Subdomain Tenant Split via Explicit Domain Registration

Status

Accepted — already applied for landing.cactusinbound.com.

Date

2026-03-26

Context

Matera needs two separate website agents with two separate knowledge bases for two subdomains: matera.eu (main site) and info.matera.eu (information portal). Each subdomain should have its own chatbot persona, its own RAG content, and its own conversation history.

The existing ADR (2026-03-12, "Website Boundaries and Host Routing") proposes a general-purpose model with Client → Website → Host Rule concepts, cookie scope contracts, analytics join rules, and a transitional schema. That model is designed for arbitrary combinations of shared and isolated subdomains across any customer. It is more than what this use case requires.

This ADR proposes a minimal path that relies on infrastructure already in place.

What already exists

The current architecture supports per-subdomain tenant isolation without code changes:

Frontend domain resolution (domainMatching.ts):

  • selectDomainMatch() tries an exact host match first, then falls back to the normalized root domain.
  • The matched domain row's domain value is sent as siteName in every API request (ConfigProvider.tsx).
  • If info.matera.eu exists in the public.domains table, the frontend sends siteName: "info.matera.eu". If it does not exist, the frontend falls back to matera.eu.

Backend tenant derivation (rag_instance_manager.py):

  • get_tenant_context(site_name) derives tenant IDs by sanitizing the siteName string. No database lookup is involved.
  • matera.eumatera_eu, info.matera.euinfo_matera_eu. These are already different tenant IDs.

Storage isolation (MongoDB, Neo4j):

  • All documents and graph nodes carry a tenantId property. Storage classes filter by the tenant ID set in a per-request contextvar.
  • Different siteName values automatically produce different tenant scopes. No separate databases or instances need to be provisioned.

Origin validation (security.py):

  • validate_origin_tenant() checks that the Origin header matches the siteName. When siteName is info.matera.eu and the request originates from info.matera.eu, validation passes.

Config resolution (config_factory.py):

  • resolve_config_for_domain() does an exact lookup on the public.domains table. A domain row for info.matera.eu gets its own config overrides via config.client_configs.

Analytics tracking (posthog-provider.ts, cross-domain-tracker.ts, session_events table):

  • The widget sends rw_domain (exact hostname) and rw_root_domain (normalized) as global properties on every PostHog event.
  • The session_events table stores events with a site_domain column set to the exact matched domain.
  • Backoffice analytics RPCs filter conversations and events by site_domain = p_domain. Conversation-based metrics are already domain-scoped.
  • However, rose_client_id and rose_last_active_session cookies are set at root-domain scope (domain=.matera.eu), so visitor identity and cross-subdomain form attribution are shared across all subdomains of the same root.

Decision

1. Register each subdomain as an explicit domain entry

For each subdomain that needs its own knowledge base, add a row to public.domains using the existing create_domain() SQL function:

SELECT create_domain('info.matera.eu', 'matera-info', 'Matera Info', '#brand-color');
SELECT create_domain('matera.eu', 'matera', 'Matera', '#brand-color');

Each domain row gets its own client_id, its own config overrides, and its own tenant scope in MongoDB/Neo4j.

2. No code changes required for tenant isolation

The existing frontend resolution, backend tenant derivation, storage isolation, origin validation, and config resolution all work as-is. The only action is operational: insert domain rows and ingest content.

The backoffice analytics pages should add informational copy to clarify what is domain-scoped vs root-domain-shared (see "Analytics behavior" in Consequences). This is a UX improvement, not a prerequisite for the subdomain split itself.

3. Subdomains without explicit entries keep current behavior

Any subdomain of matera.eu that does not have its own row in public.domains continues to fall back to the matera.eu root domain entry. This is the existing normalized match strategy in domainMatching.ts. No customer behavior changes.

4. Per-subdomain configuration

Each domain entry can have independent config overrides in config.client_configs:

  • Identity (company name, website URL)
  • Appearance (brand color, logo)
  • Chat behavior (suggested questions, greeting, model)
  • Any other config slug

5. Knowledge base content is ingested per tenant

RAG content ingestion uses siteName as the tenant key. Content ingested for info.matera.eu is stored under tenant info_matera_eu and is only retrievable by requests with that siteName.

Consequences

Positive

  • Zero code changes. Entire rollout is operational (domain registration + content ingestion).
  • Preserves existing behavior for all other customers and for unregistered subdomains.
  • Each subdomain gets full tenant isolation: separate knowledge base, separate conversations, separate config.
  • Domain rows added now map 1:1 to website entities if the full Website Boundaries model is adopted later.

Analytics behavior

Not all analytics dimensions are scoped the same way. Operators should understand what is domain-specific and what is shared.

Domain-scoped (fully isolated per subdomain):

  • Conversations: The site_domain column on the conversations table carries the exact matched domain. Backoffice analytics RPCs (get_conversations_over_time, get_conversation_funnel, get_dynamic_question_stats, get_conversation_sources, get_page_conversation_stats) all filter by WHERE site_domain = p_domain. Selecting info.matera.eu in the backoffice shows only conversations that originated on that subdomain.
  • Session events: The session_events table stores each event with the exact site_domain from the widget's rw_domain PostHog property. Events on info.matera.eu are tagged site_domain = 'info.matera.eu'.
  • PostHog event properties: Every event carries both rw_domain (exact hostname, e.g. info.matera.eu) and rw_root_domain (normalized, e.g. matera.eu), so PostHog-side filtering by subdomain is possible.

Root-domain-scoped (shared across subdomains):

  • Visitor identity (rose_client_id): The rose_client_id cookie is set at domain=.matera.eu (root scope). A visitor who browses both matera.eu and info.matera.eu has the same rose_client_id on both. Since rose_client_id equals PostHog's distinct_id, PostHog merges activity from both subdomains into one person profile.
  • Cross-subdomain form attribution: The rose_last_active_session cookie is also set at root scope. If a visitor chats on info.matera.eu then submits a form on matera.eu, the form submission is attributed to the info.matera.eu conversation session via cookie fallback. This is the intended behavior for single-domain setups but becomes surprising when subdomains are split.
  • PostHog person profiles: Because distinct_id is shared, PostHog dashboards that group by person (e.g. unique visitors, return visitors) will count a cross-subdomain visitor as one person, not two.

Backoffice UX recommendation:

Operators should not assume more isolation than actually exists. Recommended copy:

  • Conversations page: Small note — Scoped to selected domain.
  • Analytics page: Banner — Conversation metrics are specific to the selected domain. Traffic and visitor identity may include activity from sibling subdomains of the same root domain.
  • Visitor/funnel cards: Tooltip — Visitor identity is shared at the root-domain level. A visitor on multiple subdomains counts as one visitor.

Negative

  • No cookie isolation between subdomains. Rose cookies (rose_client_id, rose_last_active_session, rose_last_active_session_date) are set at .matera.eu (root domain scope), so visitor identity and session attribution are shared across all subdomains. This is acceptable when the same client owns all subdomains and does not need cross-subdomain privacy boundaries.
  • No website_id grouping key. If cross-subdomain analytics aggregation is needed later (e.g., "show me all Matera conversations"), there is no built-in way to group these domain entries. The client_id foreign key provides a partial grouping, but only if both domains share the same client.
  • The normalizeDomain() bug (multi-level TLDs like .co.uk normalize incorrectly) still affects fallback resolution. This does not affect Matera (.eu is a single-level TLD) but should be fixed independently.

Limitations — when this approach is not enough

  • Cross-subdomain cookie isolation: If two subdomains under the same root must not share tracking cookies or visitor identity, the cookie scope contract from the Website Boundaries ADR is needed.
  • Shared knowledge base across multiple hosts: If two hostnames need to share one knowledge base while a third is isolated, the many-to-one Host → Website relationship from the Website Boundaries ADR is needed.
  • Backoffice website management UX: This approach requires manual SQL to add domains. If operators need a self-service UI for managing subdomain boundaries, the full Website + Host Rule model provides the right abstraction.
  • Per-subdomain unique visitor counts: Because rose_client_id is shared at root-domain scope, there is no way to count unique visitors per subdomain independently. A visitor on both subdomains is one visitor. Achieving per-subdomain visitor isolation requires the cookie scope contract from the Website Boundaries ADR.

Relationship to the Website Boundaries ADR

This ADR is a pragmatic subset. It covers the immediate Matera use case without introducing new architectural concepts.

The Website Boundaries ADR (2026-03-12) remains the architecture target for:

  • general multi-website support across any customer
  • cookie and analytics isolation contracts
  • backoffice domain management UX
  • the website_id persistence key

The two ADRs are compatible. Domain rows created under this minimal approach will map directly to website entities when the full model is implemented.

Alternatives Considered

1. Implement the full Website Boundaries ADR first

Why not chosen:

  • Requires schema changes, cookie scope logic, analytics join rules, and backend origin validation changes.
  • The Matera use case does not need cookie isolation or analytics join contracts.
  • Delays the rollout for architectural work that serves future use cases, not the current one.

2. Use a configuration flag instead of separate domain entries

Why not chosen:

  • The existing domain resolution already supports exact-match-first. Adding a flag would duplicate logic that already works.
  • A flag does not provide tenant isolation — the knowledge base separation comes from different siteName values, which come from different domain rows.