Factory¶
The Rose Factory is an autonomous agent pipeline that runs coding tasks in Docker containers. It dispatches Claude Code agents to solve Linear tickets, onboard clients, and triage chatbot issues — then self-reviews, runs E2E verification, and opens PRs.
Architecture¶
Jobs¶
solve-linear-ticket¶
The primary job. Takes a Linear ticket identifier and produces a PR.
Pipeline phases:
| Phase | Model | What it does |
|---|---|---|
| Implementation | Sonnet | Reads ticket, implements code, runs tests, commits |
| Self-Review | Opus → Sonnet loop | Opus scores against rubric (6 criteria, 0-5 each). If any score <4, Sonnet fixes and Opus re-reviews. Max 3 iterations |
| E2E Verification | Sonnet + browser | Starts dev servers, navigates playground/backoffice with agent-browser, takes screenshots, writes report |
Usage:
# Via Docker (recommended)
rose-factory solve-linear-ticket --ticket IX-123
# Without Docker (direct execution)
rose-factory solve-linear-ticket --ticket IX-123 --no-docker
onboard-client¶
Onboards a new client with config, skills, eval dataset, and accuracy tuning.
Modes:
| Mode | What it does |
|---|---|
full |
Complete onboarding: config + skills + eval + tuning |
eval-only |
Re-run evaluation for existing client |
update-skills |
Update skills only (e.g., after KB changes) |
solve-chatbot-issues¶
Daily triage job. Scans Sentry and Langfuse for chatbot failures, creates fixes, opens PRs.
Self-Review¶
The review loop (self-review/review-loop.py) runs after implementation:
- Opus reviews the branch diff against
rubric.md - Returns scores as JSON:
correctness,code_quality,test_coverage,type_safety,security,documentation - Pass (all >= 4): creates PR with scores
- Fail (any < 4): Sonnet fixes the issues, then Opus re-reviews
- After 3 failures: creates PR anyway, flagged for human review
Scores are saved to /tmp/self-review/review-{n}.json and included in the PR body.
E2E Verification¶
The E2E phase (e2e-verify/verify.py) visually verifies the implementation:
- Starts three dev servers: backend (
:8080), playground (:3001), backoffice (:3002) - Configures the playground with the local API URL and API key
- Launches a Claude agent with
agent-browserto navigate and take screenshots - Invokes the
test-e2eskill to map branch changes to verification tests - Writes a report to
.factory/screenshots/{ticket}/report.md - Commits screenshots, posts results to PR and Linear, notifies Slack
If E2E fails, run.py reads the report, asks Sonnet to fix the issues, and re-runs verification (up to 2 attempts).
Directory Structure¶
factory/
├── Dockerfile # Multi-stage: cache warmer + final image
├── docker-compose.yml # Local dev Docker config
├── entrypoint.sh # Container entry: clone → install deps → dispatch
├── run.py # Job dispatcher with circuit breaker
├── rose-factory # CLI tool (typer) for local/Docker execution
├── settings.json # Claude Code permissions for headless mode
├── e2e-verify/
│ └── verify.py # E2E verification with screenshots
├── orchestration/
│ ├── factory.yaml # GCP Cloud Workflows definition
│ └── notify.py # Slack notification helper
├── self-review/
│ ├── review-loop.py # Opus review → Sonnet fix loop
│ └── rubric.md # Scoring rubric (6 criteria, 0-5)
└── seeds/ # Supabase seed files for test scenarios
├── base.sql
├── auth-users.sql
├── content-populated.sql
└── multi-tenant.sql
Docker Image¶
The factory runs in a self-contained Docker container with:
- Node 22 (Bookworm base)
- Python 3.12 (via uv)
- Claude Code CLI + linearis (Linear CLI)
- Chromium + agent-browser (headless browser automation)
- Supabase CLI (DB branching)
- Google Cloud CLI (secret management, .env download)
- Poetry + npm (pre-warmed dependency caches)
The container is ephemeral: clone → work → push → destroy.
GCP Cloud Workflows¶
The orchestration/factory.yaml workflow orchestrates runs on GCP:
- Receives trigger payload (job type, ticket ID, etc.)
- Notifies Slack: "Factory started"
- Runs the Cloud Run Job with environment overrides
- On success: notifies Slack "Factory done"
- On failure: notifies Slack with error, re-raises
Deploy the workflow:
Trigger manually:
gcloud workflows run factory --data='{
"job": "solve-linear-ticket",
"ticket_identifier": "IX-123",
"branch": "feature/ix-123"
}'
Configuration¶
Environment Variables¶
| Variable | Required | Description |
|---|---|---|
JOB |
Yes | Job name: solve-linear-ticket, onboard-client, solve-chatbot-issues |
TICKET_ID |
Per job | Linear ticket ID (solve-linear-ticket) |
TICKET_IDENTIFIER |
Per job | Linear ticket identifier, e.g., IX-123 |
BRANCH |
Per job | Git branch name for the work |
CLIENT_DOMAIN |
Per job | Client domain (onboard-client) |
MODE |
No | Onboard mode: full, eval-only, update-skills |
MAX_TURNS |
No | Max Claude turns (default: 100) |
MAX_DURATION_SECONDS |
No | Circuit breaker duration (default: 7200) |
Secrets (GCP Secret Manager)¶
| Secret | Purpose |
|---|---|
CLAUDE_AUTH_TOKEN |
Claude Code Max subscription auth |
FACTORY_GITHUB_TOKEN |
GitHub access for clone/push/PR |
LINEAR_API_KEY |
Linear API for ticket context and comments |
FACTORY_SLACK_WEBHOOK_URL |
Slack notifications |
LANGFUSE_SECRET_KEY |
Langfuse observability |
LANGFUSE_PUBLIC_KEY |
Langfuse observability |
LANGFUSE_HOST |
Langfuse host URL |
SENTRY_AUTH_TOKEN |
Sentry MCP for error analysis |
SUPABASE_ACCESS_TOKEN |
Supabase MCP for DB queries |
MCP Servers¶
The factory configures three MCP servers for the Claude agent:
- Linear — ticket context, comments, status updates
- Sentry — error analysis for chatbot triage
- Supabase — database queries (read-only)
Circuit Breaker¶
run.py monitors elapsed time during Claude execution. If MAX_DURATION_SECONDS is exceeded, the agent is killed and Slack is notified. Default: 2 hours.
Slack Notifications¶
The factory sends notifications for key events:
| Event | Emoji | When |
|---|---|---|
started |
Job begins | |
completed |
Job finishes successfully | |
failed |
Job fails or circuit breaker triggers | |
review-passed |
Self-review passes | |
review-failed |
Self-review fails after max iterations | |
e2e-verified |
E2E screenshots captured | |
e2e-skipped |
:next_track_button: | E2E skipped (servers failed, no screenshots) |
Related¶
- ADR: Agentic Development Pipeline — design decisions and rationale
- Linear: IX-2148 — tracking ticket