ADR: Agent Factory Orchestration¶

Status¶

Accepted

Date¶

2026-05-07

Context¶

ADR 2026-03-04 (Agentic Development Pipeline) framed the problem and surveyed the landscape, but proposed a custom dispatcher built on Cloud Workflows + a single shared GCE spot VM running a job-switch shell script. Since then:

OpenAI released Symphony (April 2026), an open spec that solves the orchestration problem we were going to hand-roll: poll an issue tracker, isolate per-issue workspaces, spawn coding agents, retry with backoff. OpenAI reports a 6× increase in merged PRs from internal teams using it.
The reviewer-experience gap became clearer. The original ADR focused on agent → PR. It didn't address how a human reviewer evaluates the PR. Without a stable preview URL pointing at an isolated stack (api + backoffice + widget + cactus + DB), reviewing an agent-authored PR means checking out the branch and running everything locally — friction that defeats the velocity win.
Supabase Branching 2.0 went GA, making per-PR Postgres isolation native via the GitHub integration with zero glue code on our side.
Cost-and-isolation tradeoffs around Neo4j / Mongo / LightRAG crystallized: per-env isolation (separate test / staging / production instances) already exists, and per-PR provisioning of these stores would multiply cost and warmup latency without a clear safety win.

These four shifts are large enough that the original ADR's specific architecture (custom dispatcher + ephemeral DB only + no preview URL) is no longer the right shape. The high-level intent — "Linear ticket → autonomous agent → reviewable PR" — stands and is carried forward.

Decision¶

Two concerns, deliberately decoupled:

The factory is where the agent runs. The preview is what the agent's PR deploys to. They have separate lifecycles, separate failure modes, and separate roadmaps. Either can be built or replaced without touching the other.

1. Agent runtime (Symphony in a local git worktree)¶

Use the OpenAI Symphony spec directly, Elixir reference implementation, run locally on a dev machine or single VM. No container, no Cloud Workflows, no GCE pool — those are future extensions, not Phase 1.

Symphony as orchestrator. Symphony claims Linear issues in active states and dispatches one Claude Code subprocess per issue. Concurrency cap: 5 parallel issues in flight on the dev VM.
Workspace = git worktree. after_create hook runs git worktree add ../<issue-id> off develop, then ./bootstrap.py --skip-env. Worktrees give us full isolation between concurrent agents without any container machinery.
Eager Supabase branch. Same after_create hook calls Supabase Mgmt API to create a branch named ix-<issue-id>, runs scripts/seed_local_db.py --branch ix-<issue-id> --max-domains 4 --since 2026-04-01 to populate it, and writes the branch-scoped Supabase keys into the worktree's .env. The agent has DB access from the start and can run migrations / inspect data while developing. Note: Supabase branches are schema-only — a fresh branch is an isolated Postgres instance with our schema, edge functions, and extensions but zero rows (unlike Neon / Xata copy-on-write data branches). The seed_local_db.py step is what makes the branch useful, not optional polish. The marginal cost is small because the script already exists for local dev (just seed); we just call it with --branch.
Symphony owns the branch lifecycle. We disable Supabase's GitHub-integration auto-branch on PR open; instead, when the PR opens we point Cloud Build at the existing ix-<issue-id> branch. One branch per Linear issue, single owner, deleted by before_remove. Avoids the dual-lifecycle reconciliation problem.
before_remove cleans up. git worktree remove + Supabase Mgmt API branch delete. Fast, deterministic, no orphans.
Coding agent is Claude Code CLI with a Max subscription token. No metered LLM cost — circuit breakers around dollars are irrelevant; we keep duration and consecutive-error breakers only.
Auth via ANTHROPIC_AUTH_TOKEN from GCP Secret Manager (scripts/secrets/refresh-claude-token.sh). One secret, refreshed on token expiry.
Tool safety via .claude/settings.json in the worktree: --allowedTools whitelist (e.g. "Read,Write,Edit,Glob,Grep,Bash(git *),Bash(just *)") and a PreToolUse hook blocking rm -rf /, docker-socket, and curl-pipe-sh patterns. Hooks fire even with --dangerously-skip-permissions, which is our headless requirement.
Long-running session discipline: let Claude Code's compaction work; checkpoint-commit per sub-task; --resume for continuation; CLAUDE.md survives compaction.
Rollback discipline: the agent only pushes to its own feature branch and opens a PR with gh pr create --base develop. Never to develop / main. The PR is the only merge path.

Registering as a first-class Linear Agent¶

We register Symphony as a Linear Agent (not just an API consumer), so it shows up as a workspace participant — assignable, @-mentionable, and visible in Linear's UI with structured status updates rather than as comments from a service user.

Concretely this requires:

A Linear OAuth Application with webhooks enabled, including the Agent session events category.
OAuth install with actor=app — creates a dedicated agent user (no billable seat) bound to the app, distinct from any human user. This is what LINEAR_API_KEY reads/writes attribute to.
Scopes: app:assignable (Symphony can be set as the issue assignee) and app:mentionable (Symphony can be @-mentioned in issues, documents, comments). The standard read/write scopes for the data we already use (issues, projects, comments).
AgentSession webhook receiver — a small Cloud Function endpoint that handles three event types: assigned, mentioned, follow-up prompt. Lives alongside the existing Linear webhook Cloud Function (reuses HMAC verification, secrets wiring, and deploy pipeline).
Pub/Sub bridge to local Symphony. The Cloud Function does not call Symphony directly. It publishes the verified event to a Pub/Sub topic (linear-agent-events); Symphony — running locally on a dev VM — maintains a long-lived gRPC pull subscription. Outbound connection only, no inbound port to expose, NAT-friendly. Messages queue in Pub/Sub while the dev VM is offline; Symphony catches up on startup. This complements (not replaces) Symphony's poll loop, which remains the source of truth for state reconciliation. The same Pub/Sub bridge works unchanged when Symphony moves to a cloud host in a future phase.
Agent Activities API emission. Symphony's before_run and after_run hooks post structured activities back to Linear:
thought activity within 10 seconds of receiving a delegation (required by spec — Linear shows "agent is thinking…" in the UI)
status updates as the agent progresses (started, opened PR, blocked)
final activity on completion or failure
Identity hygiene. All git commits, PRs, and Linear comments authored by Symphony's agent user. Reviewers can filter by agent author in Linear and GitHub.

Net-new GCP infra for this layer: one HTTP Cloud Function (~50 lines) + one Pub/Sub topic + subscription. No new repo — both ride the existing Linear-webhooks deployment.

2. Deployment target (per-PR preview environment)¶

Every PR — agent-authored or human-authored — gets a self-contained preview, deployed on push and torn down on close. Reviewer-facing URL: pr-<N>.preview.userose.ai.

Layer	Strategy	Per-PR isolation
Postgres (Supabase)	Symphony-managed eager branch `ix-<issue-id>`, populated via `scripts/seed_local_db.py --branch …` (existing tooling, ~4 prod domains, FK-aware)	Full
Cloud Run (`api`, `backoffice`)	Per-PR no-traffic tagged revision (`pr-<N>` tag), `--cpu-boost`, `--max-revisions=20`	Full
Frontend `widget` bundle	GCS prefix `gs://rose-pr-previews/<N>/widget/`	Full
Frontend `preprod-ui` (playground)	GCS prefix `gs://rose-pr-previews/<N>/preprod-ui/`	Full
Frontend `client-backoffice`	GCS prefix `gs://rose-pr-previews/<N>/backoffice/` (or served directly from Cloud Run revision, see note)	Full
Cactus / static pages	Cloudflare Pages alias `pr-<N>`	Full
Routing	Cloudflare Worker route `pr-<N>.preview.userose.ai` mapping `/api/`, `/backoffice/`, `/playground/`, `/widget/`, `/*` to the appropriate target	Full
Neo4j	Shared `test` instance, read-only when `PR_TENANT_ID` set	Read-only
MongoDB	Shared `test` instance, read-only when `PR_TENANT_ID` set	Read-only
LightRAG working dir	Shared `test` GCS path, read-only	Read-only
Redis	Shared `test`, key prefix `pr-<N>:` if writes needed	Namespace

Triggered by Cloud Build PR triggers (open + sync + close). Native Cloud Run deployment previews pattern, no third-party platform.

Stale-tag GC. Two layers protect us from accumulating stale pr-<N> tags if a teardown trigger ever fails:

--max-revisions=20 per Cloud Run service auto-prunes oldest non-traffic revisions.
A daily Cloud Scheduler job lists all pr-* tags, queries GitHub for PR state, and removes tags whose PR is closed/merged.

3. Safety boundaries¶

Production Supabase keys never reach agent shells or PR-preview Cloud Run revisions. Symphony's after_create hook creates the branch via Mgmt API and writes only the branch-scoped keys into .env. Per MEMORY.md, all IX_ENVIRONMENT values share one Supabase project — branching is the only true isolation. This is the critical safety boundary, not the seed-data shape.
Neo4j / Mongo / LightRAG are read-only for agent runs and PR previews. The agent can chat, retrieve, classify; it cannot ingest. Eliminates cross-PR collisions and protects shared test data.
Branch seeding uses real production data scoped to ~4 test domains. scripts/seed_local_db.py already exists and handles FK ordering, domain filtering, and --since trimming. Same risk model as the existing local-dev seeding (developers already pull this data via just seed). Agents need realistic shapes to validate retrieval / classification end-to-end; synthetic fixtures aren't enough for our stack. Data is not masked; the boundary is branch isolation, not content masking.
tracking_status and other curated client config stay human-approval-gated per existing memory rules.

4. Out of scope (future extensions)¶

These are explicitly not part of this ADR. They were proposed in the prior ADR; carrying them in here would conflate scaling decisions with the core "Symphony + previews" decision. Each gets its own ADR if and when needed.

Containerized factory (Docker hardening, non-root user, network allow-list, resource limits, read_only: true + tmpfs). Only relevant when we promote Symphony off the dev machine.
GCE spot VM pool / Cloud Workflows orchestration for parallel concurrent agents at scale.
Multi-job dispatch beyond rose-solve-linear-ticket. rose-onboard-client (manual + KB-webhook) and rose-solve-chatbot-issues (daily cron) keep their existing direct triggers; they are not Symphony-driven.
Self-review loop (Sonnet implements / Opus scores). Useful but additive — the PR review process already gates merges. Adopt later if agent quality demands it.
Slack heartbeat / observability beyond Langfuse + Cloud Logging. Symphony's stdout + existing Langfuse traces cover Phase 1.
Worktree promotion to ephemeral cloud workspaces (Cloud Workstations, E2B, etc.).

Consequences¶

Positive¶

Two small things instead of one big thing. Factory and preview deploy are independent — either can be replaced without disturbing the other. Symphony could be swapped for a different orchestrator; previews could move to Northflank; neither change cascades.
Phase 1 is tiny. Symphony local + git worktree + four hook scripts + Cloud Build PR triggers + Cloudflare Worker route. No container, no VM pool, no Cloud Workflows.
Reviewer experience is first-class. A stable preview URL per PR lets humans evaluate agent output in seconds, not minutes.
No platform migration. Stays on GCP. Existing per-env isolation for Neo4j/Mongo/LightRAG/Redis, Secret Manager, and Cloud Run are all reused as-is.
Subscription auth removes cost-management complexity. No MAX_COST_USD breaker, no per-run cost tracking, no token-budget tuning.
Safety by construction. Symphony's workspace path validation + Supabase branching + read-only RAG stores layer correctly. Hardest failure mode (agent writes to prod customer data) is structurally prevented.

Negative¶

Read-only RAG stores limit agent scope. Agents can't end-to-end test ingestion-touching features without falling back to staging or a manual run.
Symphony is Elixir. New runtime, but Phase 1 keeps it as a single local process; not yet a deployment concern.
Cloud Build YAML + Worker route are net-new infra surface. Small but real to maintain.
No container isolation in Phase 1. Worktrees give path isolation, not process or filesystem isolation. A misbehaving agent could in principle affect the host. Mitigations: PreToolUse hook blocking destructive Bash patterns, --allowedTools whitelist, run on a dedicated dev VM rather than a developer laptop.

Neutral¶

Local-first means concurrency is bounded by the host. Fine for single-digit parallel issues; revisit when Symphony pulls more.
Cron and manual triggers stay where they are. The chatbot-triage and onboarding jobs do not move.

Alternatives Considered¶

1. Keep the original ADR's custom dispatcher¶

Rejected. Symphony covers the dispatcher problem with a maintained spec and reference impl. Building our own gives us more code to maintain with no behavioral advantage. The original ADR's value (job patterns, harness hardening, observability) is preserved by carrying those concerns forward into Symphony's hook scripts and the agent's CLI invocation.

2. Migrate to Railway¶

Rejected. Railway gives us push-to-redeploy + DB branching out of the box, but we lose our existing per-env Neo4j/Mongo/LightRAG/Redis isolation, Secret Manager-native auth flows, and the Cloud Run + Cloudflare Worker routing already in place. Marginal ergonomic gain; large migration cost; ongoing dependency on a smaller provider. The roughly equivalent GCP setup is ~50 lines of Cloud Build YAML + 1 Worker route.

3. Per-PR Neo4j / Mongo / LightRAG instances¶

Rejected for Phase 1. Provisioning cost (Neo4j AuraDB tier minimums, Mongo Atlas warmup) and operational complexity outweigh the benefit when most agent tasks don't touch ingestion. Phase 2 may revisit with tenantId-namespaced writes against the shared test stores.

4. Synthetic-only `seed.sql` fixtures¶

Rejected. We considered hand-rolling synthetic fixtures (base.sql, auth-users.sql, content-populated.sql) instead of cloning prod. Two problems: (a) maintenance — every schema change drifts the fixtures, (b) realism — agents validating retrieval / classification need real-shaped data with realistic FK distributions; synthetic data exposes false positives. Since scripts/seed_local_db.py --branch already exists with domain filtering and FK ordering, the real-data path is cheaper to operate and more useful.

5. Masked production data clone (Snaplet / Postgres.ai)¶

Deferred. Adds a paid dependency, masking-correctness audit burden, and weekly pipeline maintenance. The branch isolation we already have is a strong-enough boundary for Phase 1. Revisit if we ever expose preview URLs beyond internal reviewers.

6. Linear API-key-only (no Agent registration)¶

Rejected. Using a plain LINEAR_API_KEY against a service-user account works for polling and writing comments, but Symphony's actions appear as comments from "some user" rather than as a recognized agent. We lose the assignable/mentionable affordances, the agent-session UI, and the structured activity feed (thought, status updates) that reviewers see directly in Linear. The OAuth actor=app install + Agents SDK is ~50 lines of webhook receiver more, and gives us the right identity model for free.

Build-time strategy¶

Cloud Build + Dockerfile is slow by default (~15–30 min per PR). Three decisions keep PR previews under 4 min without migrating off GCP:

Pre-baked dependency base image. rose-backend-base:<lockhash> rebuilt only when poetry.lock changes. PR builds become COPY-only (≈30 s).
Tests don't block preview deploys. Two parallel Cloud Build triggers per PR — pr-preview-deploy (no tests, ~2 min) and pr-checks (mypy + tests + lint, async). Reviewer gets the URL fast; CI status arrives separately.
BuildKit cache + --cache-from on Cloud Build, reusing intermediate layers across PRs.

Realistic target: 2–4 min git push → live preview URL. Railway-class without the migration. Northflank is the fallback if the gap proves unacceptable.