ADR: Website Boundaries and Host Routing¶

Status¶

Proposed

Date¶

2026-03-12

Context¶

Rose currently treats most subdomains as part of the same tenant by normalizing hostnames to a root domain. That has worked for existing customers, but it does not model an important product distinction clearly enough:

some customers consider www.example.com and blog.example.com to be the same website
some customers consider fr.example.com, de.example.com, or info.example.com to be separate websites
existing customers must keep current behavior unless someone explicitly changes it

The current framing of the problem as "subdomain isolation" is too implementation-oriented. The real architectural question is:

What is the boundary of a website in Rose, and how do hostnames route to that boundary?

This boundary affects more than config lookup. It also affects:

backend tenant selection
cookie scope
analytics grouping
origin validation
backoffice domain management

We need a model that preserves current behavior by default, but allows explicit split-out of selected hosts without special-case logic scattered across frontend, backend, and SQL.

Relationship to existing data model¶

The unified config ADR (2026-01-31, accepted) established clients → domains → configs as the data model. This ADR introduces "Website" as a concept that currently maps 1:1 with domains rows, but must be able to diverge when a client needs multiple websites under the same root domain. The implementation plan should define how website boundaries relate to the existing domains and configs tables.

Prerequisite: multi-level TLD normalization bug¶

The current normalizeDomain() function breaks multi-level TLDs: example.co.uk becomes co.uk. This must be fixed regardless of which routing model is adopted. The fix is a prerequisite for correct host routing and must be treated as Phase 0 of the implementation plan — before any routing metadata is added to the schema.

Existing frontend alignment¶

The frontend config system (domainMatching.ts) already implements an exact-then-normalized-fallback resolution strategy and returns a matchStrategy: 'exact' | 'normalized'. This maps closely to the exact_only / root_fallback model proposed here, which validates the design and reduces the scope of frontend changes needed.

Decision¶

1. Introduce a product-level distinction between Client, Website, and Host Rule¶

We will model three concepts:

Client: the customer account
Website: the isolation boundary for config, backend tenanting, cookies, analytics, and permissions. A website must have a stable internal identity (website_id) even if multiple hosts route to it.
Host rule: a hostname routing rule that maps requests to a website

This means a client may have:

one website with many hostnames
many websites under the same client
a mix of shared and isolated subdomains

2. Host rules define routing behavior¶

Each website can have one or more host rules with a match mode:

root_fallback
this host acts as the shared website for the root domain and unmatched subdomains
exact_only
only this exact hostname resolves to the website

Resolution precedence:

exact host match wins
if no exact match exists, try root_fallback
otherwise the host is unsupported

Configuration invariants:

exact host rules must be globally unique
a root_fallback rule is only valid on the registrable domain (the domain directly registered under a public suffix, e.g. matera.eu not www.matera.eu, example.co.uk not co.uk). "Registrable domain" follows the Public Suffix List definition. The TLD normalization fix (see Prerequisites) must land before this invariant can be enforced correctly.
there may be at most one root_fallback rule per normalized root domain
an exact host rule may coexist with another website's root_fallback rule; the exact rule still wins

Examples:

abtasty.com with root_fallback
www.abtasty.com and blog.abtasty.com resolve to the same website by default
blog.abtasty.com with exact_only
blog.abtasty.com can be split into its own website while the rest still fall back to abtasty.com
fr.abtasty.com with exact_only
geography-specific website with its own config and analytics boundary

3. Existing tenants keep today’s behavior by default¶

The default behavior for migrated customers will be:

one website per current tenant
one shared host rule using root_fallback

This preserves the current system semantics unless an operator explicitly creates a new website boundary.

4. Backoffice should speak in website terms, not isolation terms¶

Backoffice configuration should not expose a low-level flag like subdomain_isolation as the primary product language.

Instead, the backoffice should let operators choose whether a host is:

part of a shared website
a separate website

This better matches how customers think about their properties:

some want www and blog together
some want business-unit hosts like get.matera.eu and info.matera.eu separated

Backoffice permissions should also follow the website boundary:

access grants are website-scoped, not host-scoped
host rules inherit access from their website
adding or removing a host rule must not require duplicating per-user permissions when the website boundary itself is unchanged

5. All enforcement layers must key off the resolved website boundary¶

Once a host resolves to a website, that website boundary becomes the source of truth for:

frontend config access
backend origin validation
cookie scope
analytics grouping and joins
tenant-specific backend storage

The routing rule must be implemented consistently across all layers. No layer should re-infer website boundaries using ad hoc root-domain normalization after routing has already happened.

Routing must also be derived from authoritative request metadata, not client-supplied tenant fields:

for browser/widget traffic, the authoritative input is the request host (Host or an equivalent trusted edge-provided forwarded host header)
request-body fields such as siteName are advisory only during migration and must either be removed or checked against the resolved website before use
internal tools may use an explicit host override only through trusted code paths that do not rely on end-user input

6. Behavioral contracts across layers¶

Once a host is resolved to a website, the following contracts apply:

Host → Website resolution:

Exact host match wins.
If no exact match, try root_fallback on the normalized root domain.
Otherwise the host is unsupported.

Backend origin validation:

The backend validation pipeline is: resolve authoritative request host → resolve website → read host-rule match mode → validate Origin against that resolved website.
For exact_only websites: require exact hostname match against the resolved website domain. Sub-subdomains (e.g. evil.info.matera.eu) must be rejected.
For root_fallback websites on a registrable domain that has only one website: same-root subdomain access remains allowed (current behavior).
If a registrable domain contains more than one website, sibling websites do not inherit cross-subdomain trust from the shared root. Origin validation becomes website-specific for every website under that registrable domain, even if one of them still uses root_fallback for host resolution.

Cookie scope:

If a registrable domain routes to exactly one website, Rose-owned cookies may use registrable-domain scope to preserve current cross-subdomain behavior.
If a registrable domain routes to more than one website, Rose-owned cookies must not use registrable-domain scope anywhere on that registrable domain. In the transitional implementation this means hostname-only cookies for every website under that root. Future implementations may use another website-scoped mechanism, but cross-website cookie visibility is not allowed.
This means splitting one host into a separate website disables shared-cookie continuity for the remaining sibling hosts under that registrable domain as well. That is the necessary tradeoff for enforcing website boundaries in the browser and must be communicated before rollout.

Analytics joins:

When comparing a canonical website domain against an observed site_domain from session/visitor data:
If the website uses exact_only: require exact string equality.
If the website uses root_fallback and the registrable domain has only one website: allow root-domain equality (current behavior).
If the registrable domain has more than one website: root-domain equality is no longer sufficient. New writes must carry the resolved website's canonical host or website_id, and reads must match that pre-resolved value exactly.
This rule is directional: the canonical website domain determines the match strategy, not the observed host.

Persistence identity:

The long-term persistence key is website_id, not host string.
The requested host may still be stored separately for observability, debugging, and analytics breakdowns.
During a transitional implementation that still uses site_domain, each website must have a canonical host value and every layer must use that same canonical value consistently rather than recomputing it independently.

Backend tenanting:

Two hosts that resolve to the same website share one backend tenant (MongoDB database, Neo4j graph, RAG instance).
Two hosts that resolve to different websites under the same client get separate backend tenants, even if they share a root domain.

7. Transitional implementation may use current domain tables, but the architecture target is Website + Host Rule¶

If we need a low-disruption first step, the current public.domains table can temporarily carry routing metadata such as:

match_mode = 'root_fallback' | 'exact_only'
is_canonical = true | false — marks the single domain row whose domain value is used as the transitional persistence key (the site_domain written to analytics, backend storage, etc.) until website_id becomes first-class. Only needed if multiple domain rows can resolve to the same website during the transitional phase.

That is acceptable as an implementation bridge only for a narrow rollout shape:

each website still has a single persisted canonical host value for writes
request routing is resolved from authoritative host metadata before any tenant-specific logic runs
website-scoped permissions are not expected to span multiple host rows without an explicit website_id
analytics writes on registrable domains with multiple websites use the resolved canonical host consistently rather than relying on read-time root-domain normalization

However, the architecture target remains:

websites as first-class boundaries
hosts as routing rules into websites

This ADR is about the architectural model, not the exact first migration. An implementation plan should follow after this ADR is accepted, covering the transitional schema, migration strategy, and rollout order.

The transitional step should be considered sufficient for the first customer rollout (Matera's get.matera.eu and info.matera.eu split) only if the rollout stays within the constraints above. It is not a general substitute for a first-class websites table. Moving to the full website/host-rule schema should be driven by product needs — the clearest trigger would be a customer needing two hosts to share one website while a third is isolated under the same client, or any case where website-scoped permissions/reporting must span multiple host rows, since that requires a many-to-one host→website relationship that a flat domains table cannot express cleanly.

Consequences¶

Positive¶

Preserves current customer behavior by default.
Makes website boundaries explicit instead of implicit.
Supports mixed models under the same client: shared root, isolated subdomain, isolated geography, or any combination.
Gives frontend, backend, analytics, and backoffice one shared routing contract.
Uses product language that matches customer expectations better than a technical flag like subdomain_isolation.

Negative¶

Introduces more architectural concepts than a single boolean flag.
Requires broader changes across frontend, backend, SQL, and backoffice.
Historical analytics and tenant data will remain mixed after deployment. For example, pre-existing conversations for get.matera.eu and info.matera.eu are stored under matera.eu today. After enabling isolation, new data will be correctly scoped, but historical dashboards for the new isolated websites will appear empty unless a separate backfill migration is performed. Stakeholders must understand this tradeoff before rollout.
Splitting one website boundary under a shared registrable domain disables root-domain Rose cookies for all sibling websites under that root, which changes cross-subdomain tracking behavior for the remaining shared hosts as well.
The eventual schema may need a new first-class websites concept rather than staying entirely inside the current domains table.

Operational considerations¶

Backend config resolution (config_factory.py) caches domain lookups with a 60-second TTL. When host rules change in backoffice (e.g. splitting a subdomain into its own website), stale routing may be served for up to 60 seconds. The implementation plan should decide whether to shorten the TTL, add cache invalidation on write, or accept this window for the initial rollout.
The current site_domain column is used as a foreign key with unique constraints in accounts and visitors tables. Migrating from site_domain to website_id as the persistence key is a non-trivial schema change that affects data integrity constraints. This should be scoped explicitly in the implementation plan rather than treated as a simple rename.

Neutral¶

A short-term implementation may still use public.domains plus a routing mode enum before a fuller website/host-rule schema exists.
Some customers may never need more than the default shared-root behavior.
This ADR does not itself choose the migration strategy or rollout order; those belong in a separate implementation plan after the architecture is accepted.

Open Questions¶

The following are deferred to the implementation plan but should be resolved before the full website/host-rule schema is adopted:

Cross-website client-level reporting: If a client has multiple websites, can they see aggregated data across all of them? If yes, client_id must remain a viable aggregation key across all data stores (MongoDB, Neo4j, analytics), not just Supabase.
website_id type: Should this be a UUID (decoupled from domain strings) or a derived value? Affects migration complexity and foreign key design.
Authoritative host metadata at the edge: Which exact header or request attribute is treated as authoritative after Cloudflare/Workers forwarding (Host, X-Forwarded-Host, or another trusted header), and how is that normalized consistently across environments?

Alternatives Considered¶

1. Boolean `subdomain_isolation` on `public.domains`¶

This is the smallest tactical change, but it is not the best architectural model.

Why not chosen as the ADR:

it describes an implementation detail, not a product concept
it does not cleanly express "these two exact hosts belong to the same website"
it encourages logic to stay domain-centric instead of website-centric

It may still be a useful transitional implementation detail.

2. Exact-match-only for every hostname¶

Why not chosen:

would change existing customer behavior immediately
would force operators to enumerate every supported hostname
would break the current shared-subdomain model used by existing tenants

3. Keep root-domain normalization and add more exceptions¶