Factory¶

The Rose Factory is an autonomous agent pipeline that runs coding tasks in Docker containers. It dispatches Claude Code agents to solve Linear tickets, onboard clients, and triage chatbot issues — then self-reviews, runs E2E verification, and opens PRs.

Architecture¶

flowchart TB subgraph Triggers LIN["Linear webhook (ai-ready label)"] CRON["Cloud Scheduler (daily/weekly)"] CLI["Manual CLI (rose-factory)"] end subgraph "GCP Cloud Workflows" WF["factory.yaml orchestration"] end subgraph "Docker Container" EP["entrypoint.sh clone → install → dispatch"] RUN["run.py job dispatcher"] subgraph Jobs SLT["solve-linear-ticket"] OC["onboard-client"] SCI["solve-chatbot-issues"] end subgraph Phases IMPL["Implementation (Sonnet)"] SR["Self-Review (Opus reviews, Sonnet fixes)"] E2E["E2E Verification (Sonnet + agent-browser)"] end RUN --> Jobs SLT --> IMPL --> SR --> E2E end subgraph Outputs PR["GitHub PR (with scores + screenshots)"] SLACK["Slack notification"] LINEAR["Linear comment (summary + screenshots)"] end Triggers --> WF --> EP --> RUN E2E --> PR & SLACK & LINEAR

Jobs¶

solve-linear-ticket¶

The primary job. Takes a Linear ticket identifier and produces a PR.

Pipeline phases:

Phase	Model	What it does
Implementation	Sonnet	Reads ticket, implements code, runs tests, commits
Self-Review	Opus → Sonnet loop	Opus scores against rubric (6 criteria, 0-5 each). If any score <4, Sonnet fixes and Opus re-reviews. Max 3 iterations
E2E Verification	Sonnet + browser	Starts dev servers, navigates playground/backoffice with agent-browser, takes screenshots, writes report

Usage:

# Via Docker (recommended)
rose-factory solve-linear-ticket --ticket IX-123

# Without Docker (direct execution)
rose-factory solve-linear-ticket --ticket IX-123 --no-docker

onboard-client¶

Onboards a new client with config, skills, eval dataset, and accuracy tuning.

Modes:

Mode	What it does
`full`	Complete onboarding: config + skills + eval + tuning
`eval-only`	Re-run evaluation for existing client
`update-skills`	Update skills only (e.g., after KB changes)

rose-factory onboard-client --domain acme.com --mode full

solve-chatbot-issues¶

Daily triage job. Scans Sentry and Langfuse for chatbot failures, creates fixes, opens PRs.

rose-factory solve-chatbot-issues

Self-Review¶

The review loop (self-review/review-loop.py) runs after implementation:

Opus reviews the branch diff against rubric.md
Returns scores as JSON: correctness, code_quality, test_coverage, type_safety, security, documentation
Pass (all >= 4): creates PR with scores
Fail (any < 4): Sonnet fixes the issues, then Opus re-reviews
After 3 failures: creates PR anyway, flagged for human review

Scores are saved to /tmp/self-review/review-{n}.json and included in the PR body.

E2E Verification¶

The E2E phase (e2e-verify/verify.py) visually verifies the implementation:

Starts three dev servers: backend (:8080), playground (:3001), backoffice (:3002)
Configures the playground with the local API URL and API key
Launches a Claude agent with agent-browser to navigate and take screenshots
Invokes the test-e2e skill to map branch changes to verification tests
Writes a report to .factory/screenshots/{ticket}/report.md
Commits screenshots, posts results to PR and Linear, notifies Slack

If E2E fails, run.py reads the report, asks Sonnet to fix the issues, and re-runs verification (up to 2 attempts).

Directory Structure¶

factory/
├── Dockerfile                  # Multi-stage: cache warmer + final image
├── docker-compose.yml          # Local dev Docker config
├── entrypoint.sh               # Container entry: clone → install deps → dispatch
├── run.py                      # Job dispatcher with circuit breaker
├── rose-factory                # CLI tool (typer) for local/Docker execution
├── settings.json               # Claude Code permissions for headless mode
├── e2e-verify/
│   └── verify.py               # E2E verification with screenshots
├── orchestration/
│   ├── factory.yaml            # GCP Cloud Workflows definition
│   └── notify.py               # Slack notification helper
├── self-review/
│   ├── review-loop.py          # Opus review → Sonnet fix loop
│   └── rubric.md               # Scoring rubric (6 criteria, 0-5)
└── seeds/                      # Supabase seed files for test scenarios
    ├── base.sql
    ├── auth-users.sql
    ├── content-populated.sql
    └── multi-tenant.sql

Docker Image¶

The factory runs in a self-contained Docker container with:

Node 22 (Bookworm base)
Python 3.12 (via uv)
Claude Code CLI + linearis (Linear CLI)
Chromium + agent-browser (headless browser automation)
Supabase CLI (DB branching)
Google Cloud CLI (secret management, .env download)
Poetry + npm (pre-warmed dependency caches)

The container is ephemeral: clone → work → push → destroy.

# Build the image
rose-factory build

# Build without cache
rose-factory build --no-cache

GCP Cloud Workflows¶

The orchestration/factory.yaml workflow orchestrates runs on GCP:

Receives trigger payload (job type, ticket ID, etc.)
Notifies Slack: "Factory started"
Runs the Cloud Run Job with environment overrides
On success: notifies Slack "Factory done"
On failure: notifies Slack with error, re-raises

Deploy the workflow:

gcloud workflows deploy factory --source=factory/orchestration/factory.yaml

Trigger manually:

gcloud workflows run factory --data='{
  "job": "solve-linear-ticket",
  "ticket_identifier": "IX-123",
  "branch": "feature/ix-123"
}'

Configuration¶

Environment Variables¶

Variable	Required	Description
`JOB`	Yes	Job name: `solve-linear-ticket`, `onboard-client`, `solve-chatbot-issues`
`TICKET_ID`	Per job	Linear ticket ID (solve-linear-ticket)
`TICKET_IDENTIFIER`	Per job	Linear ticket identifier, e.g., `IX-123`
`BRANCH`	Per job	Git branch name for the work
`CLIENT_DOMAIN`	Per job	Client domain (onboard-client)
`MODE`	No	Onboard mode: `full`, `eval-only`, `update-skills`
`MAX_TURNS`	No	Max Claude turns (default: 100)
`MAX_DURATION_SECONDS`	No	Circuit breaker duration (default: 7200)

Secrets (GCP Secret Manager)¶

Secret	Purpose
`CLAUDE_AUTH_TOKEN`	Claude Code Max subscription auth
`FACTORY_GITHUB_TOKEN`	GitHub access for clone/push/PR
`LINEAR_API_KEY`	Linear API for ticket context and comments
`FACTORY_SLACK_WEBHOOK_URL`	Slack notifications
`LANGFUSE_SECRET_KEY`	Langfuse observability
`LANGFUSE_PUBLIC_KEY`	Langfuse observability
`LANGFUSE_HOST`	Langfuse host URL
`SENTRY_AUTH_TOKEN`	Sentry MCP for error analysis
`SUPABASE_ACCESS_TOKEN`	Supabase MCP for DB queries

MCP Servers¶

The factory configures three MCP servers for the Claude agent:

Linear — ticket context, comments, status updates
Sentry — error analysis for chatbot triage
Supabase — database queries (read-only)

Circuit Breaker¶

run.py monitors elapsed time during Claude execution. If MAX_DURATION_SECONDS is exceeded, the agent is killed and Slack is notified. Default: 2 hours.

Slack Notifications¶

The factory sends notifications for key events:

Event	Emoji	When
`started`		Job begins
`completed`		Job finishes successfully
`failed`		Job fails or circuit breaker triggers
`review-passed`		Self-review passes
`review-failed`		Self-review fails after max iterations
`e2e-verified`		E2E screenshots captured
`e2e-skipped`	:next_track_button:	E2E skipped (servers failed, no screenshots)