Skip to content

Factory

The Rose Factory is an autonomous agent pipeline that runs coding tasks in Docker containers. It dispatches Claude Code agents to solve Linear tickets, onboard clients, and triage chatbot issues — then self-reviews, runs E2E verification, and opens PRs.

Architecture

flowchart TB subgraph Triggers LIN["Linear webhook<br/>(ai-ready label)"] CRON["Cloud Scheduler<br/>(daily/weekly)"] CLI["Manual CLI<br/>(rose-factory)"] end subgraph "GCP Cloud Workflows" WF["factory.yaml<br/>orchestration"] end subgraph "Docker Container" EP["entrypoint.sh<br/>clone → install → dispatch"] RUN["run.py<br/>job dispatcher"] subgraph Jobs SLT["solve-linear-ticket"] OC["onboard-client"] SCI["solve-chatbot-issues"] end subgraph Phases IMPL["Implementation<br/>(Sonnet)"] SR["Self-Review<br/>(Opus reviews, Sonnet fixes)"] E2E["E2E Verification<br/>(Sonnet + agent-browser)"] end RUN --> Jobs SLT --> IMPL --> SR --> E2E end subgraph Outputs PR["GitHub PR<br/>(with scores + screenshots)"] SLACK["Slack notification"] LINEAR["Linear comment<br/>(summary + screenshots)"] end Triggers --> WF --> EP --> RUN E2E --> PR & SLACK & LINEAR

Jobs

solve-linear-ticket

The primary job. Takes a Linear ticket identifier and produces a PR.

Pipeline phases:

Phase Model What it does
Implementation Sonnet Reads ticket, implements code, runs tests, commits
Self-Review Opus → Sonnet loop Opus scores against rubric (6 criteria, 0-5 each). If any score <4, Sonnet fixes and Opus re-reviews. Max 3 iterations
E2E Verification Sonnet + browser Starts dev servers, navigates playground/backoffice with agent-browser, takes screenshots, writes report

Usage:

# Via Docker (recommended)
rose-factory solve-linear-ticket --ticket IX-123

# Without Docker (direct execution)
rose-factory solve-linear-ticket --ticket IX-123 --no-docker

onboard-client

Onboards a new client with config, skills, eval dataset, and accuracy tuning.

Modes:

Mode What it does
full Complete onboarding: config + skills + eval + tuning
eval-only Re-run evaluation for existing client
update-skills Update skills only (e.g., after KB changes)
rose-factory onboard-client --domain acme.com --mode full

solve-chatbot-issues

Daily triage job. Scans Sentry and Langfuse for chatbot failures, creates fixes, opens PRs.

rose-factory solve-chatbot-issues

Self-Review

The review loop (self-review/review-loop.py) runs after implementation:

  1. Opus reviews the branch diff against rubric.md
  2. Returns scores as JSON: correctness, code_quality, test_coverage, type_safety, security, documentation
  3. Pass (all >= 4): creates PR with scores
  4. Fail (any < 4): Sonnet fixes the issues, then Opus re-reviews
  5. After 3 failures: creates PR anyway, flagged for human review

Scores are saved to /tmp/self-review/review-{n}.json and included in the PR body.

E2E Verification

The E2E phase (e2e-verify/verify.py) visually verifies the implementation:

  1. Starts three dev servers: backend (:8080), playground (:3001), backoffice (:3002)
  2. Configures the playground with the local API URL and API key
  3. Launches a Claude agent with agent-browser to navigate and take screenshots
  4. Invokes the test-e2e skill to map branch changes to verification tests
  5. Writes a report to .factory/screenshots/{ticket}/report.md
  6. Commits screenshots, posts results to PR and Linear, notifies Slack

If E2E fails, run.py reads the report, asks Sonnet to fix the issues, and re-runs verification (up to 2 attempts).

Directory Structure

factory/
├── Dockerfile                  # Multi-stage: cache warmer + final image
├── docker-compose.yml          # Local dev Docker config
├── entrypoint.sh               # Container entry: clone → install deps → dispatch
├── run.py                      # Job dispatcher with circuit breaker
├── rose-factory                # CLI tool (typer) for local/Docker execution
├── settings.json               # Claude Code permissions for headless mode
├── e2e-verify/
│   └── verify.py               # E2E verification with screenshots
├── orchestration/
│   ├── factory.yaml            # GCP Cloud Workflows definition
│   └── notify.py               # Slack notification helper
├── self-review/
│   ├── review-loop.py          # Opus review → Sonnet fix loop
│   └── rubric.md               # Scoring rubric (6 criteria, 0-5)
└── seeds/                      # Supabase seed files for test scenarios
    ├── base.sql
    ├── auth-users.sql
    ├── content-populated.sql
    └── multi-tenant.sql

Docker Image

The factory runs in a self-contained Docker container with:

  • Node 22 (Bookworm base)
  • Python 3.12 (via uv)
  • Claude Code CLI + linearis (Linear CLI)
  • Chromium + agent-browser (headless browser automation)
  • Supabase CLI (DB branching)
  • Google Cloud CLI (secret management, .env download)
  • Poetry + npm (pre-warmed dependency caches)

The container is ephemeral: clone → work → push → destroy.

# Build the image
rose-factory build

# Build without cache
rose-factory build --no-cache

GCP Cloud Workflows

The orchestration/factory.yaml workflow orchestrates runs on GCP:

  1. Receives trigger payload (job type, ticket ID, etc.)
  2. Notifies Slack: "Factory started"
  3. Runs the Cloud Run Job with environment overrides
  4. On success: notifies Slack "Factory done"
  5. On failure: notifies Slack with error, re-raises

Deploy the workflow:

gcloud workflows deploy factory --source=factory/orchestration/factory.yaml

Trigger manually:

gcloud workflows run factory --data='{
  "job": "solve-linear-ticket",
  "ticket_identifier": "IX-123",
  "branch": "feature/ix-123"
}'

Configuration

Environment Variables

Variable Required Description
JOB Yes Job name: solve-linear-ticket, onboard-client, solve-chatbot-issues
TICKET_ID Per job Linear ticket ID (solve-linear-ticket)
TICKET_IDENTIFIER Per job Linear ticket identifier, e.g., IX-123
BRANCH Per job Git branch name for the work
CLIENT_DOMAIN Per job Client domain (onboard-client)
MODE No Onboard mode: full, eval-only, update-skills
MAX_TURNS No Max Claude turns (default: 100)
MAX_DURATION_SECONDS No Circuit breaker duration (default: 7200)

Secrets (GCP Secret Manager)

Secret Purpose
CLAUDE_AUTH_TOKEN Claude Code Max subscription auth
FACTORY_GITHUB_TOKEN GitHub access for clone/push/PR
LINEAR_API_KEY Linear API for ticket context and comments
FACTORY_SLACK_WEBHOOK_URL Slack notifications
LANGFUSE_SECRET_KEY Langfuse observability
LANGFUSE_PUBLIC_KEY Langfuse observability
LANGFUSE_HOST Langfuse host URL
SENTRY_AUTH_TOKEN Sentry MCP for error analysis
SUPABASE_ACCESS_TOKEN Supabase MCP for DB queries

MCP Servers

The factory configures three MCP servers for the Claude agent:

  • Linear — ticket context, comments, status updates
  • Sentry — error analysis for chatbot triage
  • Supabase — database queries (read-only)

Circuit Breaker

run.py monitors elapsed time during Claude execution. If MAX_DURATION_SECONDS is exceeded, the agent is killed and Slack is notified. Default: 2 hours.

Slack Notifications

The factory sends notifications for key events:

Event Emoji When
started 🏭 Job begins
completed ✅ Job finishes successfully
failed 🚨 Job fails or circuit breaker triggers
review-passed ✅ Self-review passes
review-failed ⚠ Self-review fails after max iterations
e2e-verified 🖼 E2E screenshots captured
e2e-skipped :next_track_button: E2E skipped (servers failed, no screenshots)