March 2026 - R&D Journal: Routing Boundaries and Agentic Feedback Loops¶

Context¶

March formalized the routing problems exposed in February. Rose now had multiple entry points, model paths, host-page states, and remediation paths. The month's research question was whether the platform could decompose routing into stable boundaries instead of adding prompt instructions or feature-specific patches.

The month advances three of the validated 2026 R&D projects: Adaptive Multi-Tenant Conversation Orchestration (the routing boundaries themselves), Closed-Loop Agent Evaluation and Optimization (a repeatable remediation loop), and the Compounding Context Engine for Company, Industry, and Buyer Intelligence (preserving visitor identity across surfaces). No outbound generated-publishing work was done this month, so the Synthesized Knowledge Publishing project does not appear. Customer-facing releases are kept separate from the R&D below.

Adaptive Multi-Tenant Conversation Orchestration¶

Project and lock¶

Several March failures looked like chatbot failures but were actually routing-boundary failures, and the lock was recognizing that they belonged to different owners rather than to one prompt. Model and provider choice was leaking into feature code, so there was no single place to reason about answer quality, latency, or cost. Single-page-app navigation and URL normalization were changing the widget's state after the agent had already made a decision, so a lifecycle event could silently invalidate a routing choice. Page-native AI Sections needed source metadata and hidden context, not just the visible button text. And content-gating rules could collide with qualification in the final prompt. The common thread is that "routing" was being treated as one monolithic agent step when it is really several independent decisions — which model, which host-page lifecycle state, which interaction source — each of which needs its own contract. February had shown that the interaction source changes behavior; March's lock was turning that observation into architecture.

This month's work¶

The hypothesis was that routing becomes more reliable when model selection, host-page lifecycle, and interaction source are separated into explicit boundaries rather than handled by one agent prompt.

The work gave model selection to a resolver, so that which model answers is decided in one place that can weigh quality, latency, provider fallback, and cost, rather than being hard-coded in feature paths. Host-page routing was separated from the agent's own decisions: URL patterns were normalized for page overrides, and the copilot layout was split from the form-assistant mode so a navigation event no longer mutates a state the agent already acted on. Page-native AI Sections were given a headless contract carrying source metadata and hidden context rather than only visible text, and content-gating was reworked so its prompt injection is data-driven and no longer fights qualification for room in the final prompt.

Results, proof, and next step¶

The first learning is that model routing has to be resolver-owned: a single owner is what makes model quality, latency, provider fallback, and cost reasoned about rather than scattered. The second is that browser-lifecycle routing cannot be fixed with prompt changes — the widget-and-host boundary needs its own contract, because the failure happens outside the agent's turn. The negative learning came from the model-routing work itself: enabling a faster or alternate answer model without parity scaffolding is risky, and the month went through revert-and-restore cycles around answer-node routing before the boundary was safe. The remaining uncertainty is that the architecture was defined but the route-by-route comparisons of quality, cost, and latency were not yet measured; no experiment ran in March to quantify the boundaries' effect.

The evidence is the set of architecture decisions recorded this month for model routing, host-page and widget boundaries, and headless AI Sections, together with the implementation history of the routing phases and the answer-node revert-and-restore cycles. The month's changelog entries are product context only.

Next step: connect the routing boundaries to knowledge freshness, qualification, and CRM context without enlarging the prompt or the chat latency, and measure quality, cost, and latency per route.

Closed-Loop Agent Evaluation and Optimization¶

This project advanced on two fronts in March: a remediation loop that turns observed failures into repeatable fixes, and the first signs of the metric-definition problem that becomes May's measurement architecture. Both are the same lock — closing the gap between what the agent does and what the team can trust about it.

Project and lock¶

On the remediation side, answer-quality failures arise from many sources — missing knowledge, a bad skill selection, the wrong prompt priority, unsafe routing, or a client-specific product fact — and the lock is that neither extreme works: manual triage does not scale, but fully automated repair is unsafe because a wrong fix ships silently. On the measurement side, a second lock surfaced this month: the same quantity was being computed differently in different places. Visitor counts, impressions, and engagement rates were calculated one way on one dashboard tile and another way on the next, and "conversion" had no single definition — a demo booking and an email capture are not worth the same, and a two-second bounce should not count at all. Until a quantity means one thing everywhere, no later experiment can read it.

This month's work¶

The hypotheses were that capturing failures from the backoffice and routing them into an agentic development pipeline makes remediation repeatable while keeping a human in the loop, and that pinning down single definitions for the core metrics is the precondition for any later causal measurement.

On remediation, the work built the capture end first: a support form, issue labels, and — critically — the original visitor message context attached to each ticket, so the person fixing an answer sees what the visitor actually said rather than only the failing reply. From there it defined an agentic development pipeline for working those tickets, plus root-cause categories and triage workflows that turn ad hoc fixes into a classified, repeatable process. A first evaluation harness was added alongside — a pipeline that scores the quality of Rose's suggested questions systematically rather than judging them anecdotally — so that a fix could later be validated against a measure rather than an opinion. On measurement, the work began converging the metric definitions: visitor counts, impressions, and engagement rate were moved onto a single calculation shared across every dashboard tile; conversion was redefined as a weighted quantity that credits a demo booking fully and an email capture partially rather than treating them as equal; and sessions shorter than ten seconds were excluded as bounces so the denominator reflects real engagement.

Results, proof, and next step¶

The remediation learnings are that fixing an answer needs the original visitor context — a failing reply in isolation is not enough to diagnose, because the same reply can be wrong for different reasons — and that root-cause categories and evaluation datasets are what make skill fixes repeatable. The measurement learning is the one that compounds: the moment two tiles disagreed on the same number, it was clear that metric definitions cannot live independently per surface, and that a conversion needs a single, weighted, bounce-filtered definition before any experiment can trust it. This is the first concrete instance of the metric-drift lock that drives the layered measurement architecture two months later. The remaining uncertainty is that March created the loop structure and began unifying definitions but measured none of its effect — no throughput, completion rate, or recurrence-reduction figure exists yet, and no experiment ran on the new definitions.

The evidence is the architecture decision recorded for the agentic development pipeline, the implementation history of the capture, labeling, and triage workflows, and the changelog record of the metric-unification and weighted-conversion changes becoming visible. The changelog itself is product context; the retained R&D is the definition-consistency uncertainty it exposed, not the dashboard wiring.

Next step: run the loop at enough volume to measure throughput and recurrence reduction, and carry the unified definitions into a layered analytics substrate that enforces them.

Compounding Context Engine for Company, Industry, and Buyer Intelligence¶

March's work on this project answered two inbound questions about a visitor: is this the same person across the surfaces a conversion passes through, and who is the company behind them? Identity continuity and company enrichment are both buyer-intelligence substrate, distinct from the conversion-crediting mechanics that belong to the conversation and measurement contracts.

Project and lock¶

Identity and attribution break easily in embedded third-party flows. A session identifier can be lost when a visitor moves into an embedded calendar, subdomains can mismatch so the same person looks like two, and booking and form providers each emit a differently shaped event. The lock was how much of this could be generalized into a shared identity model rather than solved with a per-client patch each time a new provider or subdomain appeared. A second facet of the lock concerns enrichment: most visitors arrive anonymous, and resolving them to a company depends on third-party IP-to-company providers whose coverage is partial and inconsistent, so no single provider is enough.

This month's work¶

The hypothesis was that preserving identity needs provider-specific detection on top of a shared session-identity model, and that resolving the company behind a visitor needs several enrichment sources chained rather than one trusted source.

On identity, the work captured conversion signals out of an embedded calendar through cross-frame messaging, added root-domain matching and normalized URL handling so a visitor crossing subdomains stays one identity, and made the post-capture redaction of identifying data configurable so the identity model can be used without over-retaining personal data. Provider-aware tracking handled the differing event shapes, while the shared session model kept the visitor coherent underneath them. On enrichment, the month added an IP-to-company source and chained it with an existing enrichment provider so that one source's miss can be covered by another, and surfaced which source resolved a given account so the provenance of an inferred company is auditable rather than opaque.

Results, proof, and next step¶

The first learning is that attribution needs both layers: provider-specific detection for the differing event shapes, and a shared session-identity model so the visitor remains one person across them. The second is that enrichment is a coverage problem best handled by chaining sources with visible provenance, not by trusting a single provider, because any one IP-to-company source resolves only part of the traffic. The negative learning is that host-page URL and subdomain assumptions are unsafe — root-domain matching and normalized URL handling are required, because the naive assumption splits one visitor into several. The remaining uncertainty is that March established feasibility but did not measure provider coverage, enrichment hit-rate, or false-attribution rate; those need broader observation across more providers than March exercised.

The evidence is the implementation history of the cross-frame conversion capture, the root-domain and URL-normalization identity work, the configurable redaction, and the chained IP-to-company enrichment with source display. The changelog records the Calendly tracking and conversion-tracking guides becoming visible; those are product context, while the retained work is the generalized identity-and-enrichment model beneath them.

Next step: widen provider coverage and measure enrichment hit-rate and false-attribution rate before treating the models as general.

Non-R&D / Productization Context¶

Not retained as R&D:

Client onboarding skill additions and prompt fact corrections.
Visual typography and layout polish.
Routine dependency bumps.
Backoffice table cosmetics and filters using known patterns.
Documentation-only updates.
Dashboard navigation, support-ticket tracking, CTA popup mode, and AI Section preview improvements, except where they support the routing, feedback-loop, or identity locks above.

Research Outcome¶

March's research outcome was boundary decomposition. The month established that "routing" is not a single agent step: model routing and interaction-source routing belong to Adaptive Multi-Tenant Conversation Orchestration, remediation routing belongs to Closed-Loop Agent Evaluation and Optimization, and visitor-identity continuity belongs to the Compounding Context Engine. Each boundary is now a contract with an owner. What remained outstanding was measurement of every one of them — per-route quality and cost, loop throughput, and provider coverage — none of which March produced, because the month's result was the decomposition itself, not yet a reading from it.