Onboarding: Where Human Time Goes, and How to Automate It¶

Internal doc. Audience: Rose team. Purpose: make explicit what costs human time in a client onboarding today, who does each part, and the plan to push that work into agents (a configuration agent, a knowledge agent, a playground, and the onboarding agent that ties them together).

Source of truth for the current manual flow is the rose-onboard-client skill and the skills it calls. This doc reframes that flow as a plan and links the existing designs that already cover parts of it:

Knowledge ingestion automation → ADR: Productized Website Scraping Pipeline

Skill development + eval loop → Prompt & Skill Workflow, Skill System

Autonomous agent runtime (worktree, preview, lifecycle) → ADR: Agent Factory Orchestration

TL;DR¶

Claude already does most of the doing — scrape, ingest, generate config, generate skills, run evals — through the onboarding skills. What costs human time now is reviewing: checking Claude's output before it ships, and relaying/reviewing client feedback by proxy. Staff is the reviewer and orchestrator, not the doer.

So the human time is: judging knowledge quality (is the pricing right? are competitors right? did a bad page poison an answer?), validating generated config/skills, the long tail of small per-client tweaks, a few tricky verified setups (conversion tracking, lookup tables, bespoke per-jurisdiction skills), and acting as the middle layer between the client and Claude.

The plan: automate scraping and key-knowledge ingestion (in progress — see the scraping ADR), then move the review loop to the client directly via a playground + a configuration agent callable from the frontend, so the client reviews answers, pricing, competitors, and invalidates bad content itself, instead of staff doing it by proxy.

The biggest time sinks (ranked)¶

Where the hours actually go, most to least:

Rank	Time sink	Who	Why it's slow
1	Conversion / form-tracking setup + diagnosis	Staff	Most fragile step. Live-page browser inspection, pick the right detection strategy (native / iframe / SPA / GTM), confirm `rw_client_form_submitted` fires. Forms break detection constantly; needs human approval (`tracking_status`).
2	Long tail of small per-client tweaks	Staff	Onboarding never ends in one pass — endless little changes (reword a question, fix a CTA, adjust one skill line). Each is a manual round-trip. Collectively the bulk of ongoing time.
3	Knowledge review (pricing, competitors, bad pages)	Staff judging Claude output	High-judgment: stale prices and wrong competitor scope are the classic failures. A bad source page poisons answers.
4	Custom per-client features	Staff (real engineering)	One-off features built for a single client — no reuse, not generatable. Each needs design + build + eval. Examples: per-jurisdiction answer routing (iBanFirst), themed/labelled FAQ (Roundtable), custom lookup tables (PayFit CCN, countries, integrations, plan tiers, feature matrix).
5	Skills + eval prompt-engineering loop	Staff judging Claude output	Write → run `rose-eval` → read transcripts → fix → re-run. The custom features above are the heaviest cases.
6	Client feedback relay	Staff ↔ client	Staff is a manual middle layer: client flags wrong answer → staff interprets → points Claude at fix → re-runs → relays back.

Claude already does the doing in all of these — the time is the human review + relay loop around it, plus the fragile/bespoke setups that need live verification. Details per phase below.

The pieces¶

Onboarding decomposes into four big pieces. Each is at a different stage.

Piece	What it is	State
Knowledge agent	Scrape + ingest site/docs into RAG, plus create key knowledge (pricing, competitors)	Partly manual; productization designed in the scraping ADR
Cropping	Trim/clean scraped pages so the KB isn't poisoned	Mostly done (per-site cleanup + RAG-friendly checks)
Playground	Client-facing surface to run questions and review answers	To build
Onboarding agent (+ knowledge agent)	Generate config + skills, run evals, drive client review	Config exists as skills; automate into a frontend-callable agent

Today vs target (diagram)¶

Red = human time (staff). Green = automated (Claude / pipeline). Blue = client. Today the client only touches the very end, through staff; the target moves the whole review loop to the client.

flowchart TB subgraph TODAY["TODAY — staff drives, client reviews by proxy"] direction TB T_scrape["Scrape site + docs (Claude runs skill)"]:::auto T_crop["Crop / clean KB"]:::auto T_know["Review knowledge pricing · competitors · bad pages"]:::human T_cfg["Generate config (Claude runs skill)"]:::auto T_cfgrev["Review config"]:::human T_skill["Generate skills + eval (Claude runs skill)"]:::auto T_skillrev["Review skills / read transcripts"]:::human T_custom["Custom per-client features per-jurisdiction · themed FAQ · lookup tables"]:::human T_track["Conversion + form-tracking setup & diagnosis (FRAGILE)"]:::human T_tweak["Long tail: small per-client tweaks"]:::human T_relay["Relay client feedback ↔ Claude"]:::human T_client["Client: 'this answer is wrong'"]:::client T_scrape --> T_crop --> T_know --> T_cfg --> T_cfgrev --> T_skill --> T_skillrev --> T_custom --> T_track --> T_tweak T_client -->|via staff| T_relay --> T_skill end subgraph TARGET["TARGET — client drives via playground + frontend config agent"] direction TB G_pipe["Auto scrape + ingest + key knowledge (productized pipeline)"]:::auto G_cfg["Config agent generates config + skills + questions"]:::auto G_play["Playground: run buyer questions, show answers"]:::auto G_review["Client reviews answers, pricing, competitors"]:::client G_invalidate["Client invalidates bad page/content"]:::client G_tweak["Client asks tweak in plain language → agent applies"]:::client G_gate["Verified setups keep human sign-off (tracking, lookup tables)"]:::human G_pipe --> G_cfg --> G_play --> G_review --> G_invalidate --> G_pipe G_review --> G_tweak --> G_cfg G_cfg -.-> G_gate end TODAY -.->|automate the review/relay loop| TARGET classDef human fill:#ffd9d9,stroke:#c0392b,color:#7b241c; classDef auto fill:#d9f2d9,stroke:#27ae60,color:#1e7d34; classDef client fill:#d9e8ff,stroke:#2e6fc0,color:#1c4a85;

The red nodes are what we're paying for. The target collapses them: the pipeline absorbs the knowledge work, the config agent absorbs generation + small tweaks, and the client absorbs the review that staff does by proxy today. The only human gate that stays is sign-off on the fragile verified setups.

What takes human time today (and who does it)¶

The current flow, per rose-onboard-client. Claude runs each skill; the human time is review, not execution. "Staff" = Rose onboarding engineer (reviewer / orchestrator). "Client" = the customer being onboarded.

Across all phases the staff job is the same shape: kick off the skill, read what Claude produced, judge if it's correct/safe, send it back or sign off.

1. Knowledge ingestion — CLAUDE does it, STAFF reviews (heavy)¶

Scrape the site (rose-scrape-website): build a curated URL list (blind --follow poisons the KB with /tag/, /author/, pagination, PR noise), run the scraper, audit quality, write a per-site cleanup script.
Scrape internal docs (rose-scrape-internal-docs): convert client decks / PDFs / pricing sheets to markdown, strip sensitive info (margins, commissions, employee names, "do not share" notes), invent nothing.
Cropping / cleanup: the per-site cleanup-script work + RAG-friendly checks. Mostly automated now, but still needs a human eye on the audit.
Key knowledge (pricing, competitors): today these come out of scraping + manual curation. Getting them right is high-judgment: stale prices and wrong competitor scope are the classic failure modes.

Time sink: judging whether the ingested knowledge is correct and safe. The productized pipeline that replaces the ad-hoc local workflow is designed in the scraping ADR (Firecrawl map/crawl + bounded LLM include/exclude decisions + GCS-versioned corpus + the existing document-loader into LightRAG/Neo4j/MongoDB).

2. Config generation — CLAUDE does it, STAFF reviews (medium)¶

rose-create-client-config reads the KB + browses the site and writes the configs + curated content:

identity, CTAs, engagement (suggested questions per language), analytics taxonomy, qualification signals, competitors, appearance, curated content.
Staff judgment: does the generated config match reality? Are languages right? Are CTAs pointing at the real demo form?

Not auto-generated: the 5 ROI metrics (monthly_traffic, monthly_demos, demos_to_deals_rate, acv, rose_cost). These are client CRM/billing numbers — staff has to ask the client. rose_cost is billing-only.

3. Skills generation + eval — CLAUDE does it, STAFF reviews (medium/heavy)¶

rose-create-client-skills (see Skill System and the Prompt & Skill Workflow):

Compare client needs against all global skills, decide which (if any) skill overrides are needed (pricing behavior, competitor framing, redirects).
Write the smallest skill that works, then trim.
Build an eval dataset and run rose-eval on staging, iterate on rose-chat.

Time sink: prompt-engineering loop — write, run, read transcripts, fix, re-run.

4. Client review — CLIENT, today routed through STAFF¶

Today the client reviews after Claude + staff have done everything, and the feedback comes back through staff: the client says "this answer is wrong", staff interprets it, points Claude at the fix, re-runs, relays the result. Staff is a manual relay between the client and Claude. The client should be validating pricing, competitors, and answer quality directly — but there's no self-serve surface for it yet.

5. Custom per-client features — STAFF (real engineering, biggest hidden cost)¶

The category the "generate config + skills" model doesn't capture at all: one-off features built for a single client. They aren't a config value or a generated skill — they're bespoke behavior that only ever benefits that one client, with no reuse across the fleet. Each one is a real design + build + eval cycle, and they recur as clients ask for things the generic flow can't do.

Examples (the point is the category, not the list):

Per-jurisdiction answer routing (iBanFirst) — answers must be scoped to the visitor's country (coverage, regulation, availability differ), so a custom skill routes the answer by jurisdiction instead of giving one global reply.
Themed / labelled FAQ (Roundtable) — FAQ knowledge organized by theme + labels so the agent surfaces the right grouped answer.
Custom lookup tables — any deterministic coverage list the client supplies: supported CCN / IDCC (e.g. PayFit), countries, integrations, plan tiers, feature matrix, etc. Build the YAML mapping, import the list, smoke-test the lookup renders in the answer-node prompt. PayFit CCN is just one example — the mechanism is generic.

These need real prompt-engineering + eval iteration, not generation, and they're the heaviest single-client time sink.

6. The long tail of small changes — STAFF (underestimated)¶

Onboarding isn't one pass — it's a steady stream of little tweaks per client (reword a question, fix a CTA label, adjust one skill line, nudge a redirect). Individually tiny, collectively the bulk of ongoing human time. Each one is a manual round-trip.

7. Conversion / form-tracking setup + diagnosis — STAFF (most fragile)¶

Set up form/conversion tracking and verify it actually fires on the live site. The most fragile part of onboarding: rose-form-tracking-diagnosis has to inspect the live page with a browser, figure out which of several detection strategies applies (native submit, iframe-embedded forms, SPA, GTM-injected), and confirm rw_client_form_submitted actually fires. Forms break detection constantly (new-editor HubSpot forms always iframe, no postMessage; cache plugins externalize inline scripts; A/B-split redirects change the destination). High-judgment, easy to get wrong, and tracking_status requires explicit human approval (never written autonomously). Must be checked against the real page, not just config.

Where the human time actually concentrates¶

Claude does the doing. The bottleneck is the review/relay loop:

Staff reviewing Claude's generated knowledge / config / skills before ship.
Staff relaying client feedback into the next Claude iteration.
Custom per-client features — real one-off engineering, no reuse.
The long tail of small per-client tweaks, and the fragile tracking setup.

Automating the doing further (scraping, key knowledge) helps, but the bigger win is removing staff from the review/relay loop — see Future work.

Linear / brief handling — STAFF¶

Read the parent onboarding ticket, extract the brief (domain, languages, CTA, CRM, qualification questions, redirects, tracking pages, ROI numbers), save it for downstream skills. Manual but light.

Future work¶

Already moving: automate scraping + key-knowledge ingestion¶

We're making sure scraping works automatically with the latest changes, and extending that to key knowledge — pricing and competitors — so those are ingested automatically instead of hand-curated. Cropping is mostly done. The target architecture is the Productized Website Scraping Pipeline ADR: an explicit Cloud Run state machine (map → triage → sample → decide → scrape → preflight → cleanup → validate → publish), backoffice-inspectable run state, and GCS-versioned corpora — no autonomous agent loop, and no running arbitrary per-client cleanup code in production.

What remains: put the client in the loop¶

The remaining human time is the review/relay loop — and the judgment in it is something the client is better placed to make than staff. The plan is to cut staff out of the middle and give the client that loop directly:

Put the client in the loop — stop reviewing knowledge by proxy.
Generate questions for the client in a playground — auto-produce the questions a buyer would ask (and the ones we're unsure about).
Run them and let the client review automatically — show the client the agent's answers and let them flag what's wrong, instead of staff reading every transcript.
Let the client look at pricing and competitors — the two highest-risk knowledge areas, reviewed by the people who actually know them.
Let the client invalidate pages/content — when an answer is wrong because of a bad source page, the client can mark that page/content as bad so it stops feeding answers. (The scraping ADR already has the artifact/tombstone model —  — to drop removed documents from the index.)

How: a configuration agent callable from the frontend¶

A big part of this is already done as configuration (the configs, curated content, skills). The next step is to wrap it in an agent that does configuration — generating dynamic questions, running them, applying client feedback — and call that agent from the frontend so the client drives it.

Mapping to the four pieces:

Knowledge agent — automate scrape + ingest + key-knowledge (pricing, competitors) per the scraping ADR; expose "invalidate this page/content" so bad sources can be dropped.
Cropping — mostly done; keep it automatic.
Playground — the client-facing surface: generate questions, run them, collect client verdicts, surface pricing/competitor review.
Onboarding agent (+ knowledge agent) — orchestrates the above. Today's rose-onboard-client + rose-create-client-config + rose-create-client-skills logic becomes an agent callable from the frontend, with the client in the loop instead of staff running every step by hand. The Agent Factory ADR covers the runtime/isolation patterns this agent can reuse.

The long tail in the automated world¶

The four-piece plan handles first-pass generation + client review. Two tail items need their own answer:

Small per-customer changes → the same frontend config agent should take a plain-language ask from the client ("change this question", "fix this CTA") and apply it, so the steady stream of tweaks stops being a staff round-trip.
Custom per-client features (per-jurisdiction routing, themed FAQ, lookup tables, …) → the hardest to automate, because each is bespoke. Push the repeatable shapes into reusable mechanisms (lookup tables are already one), so "custom feature" shrinks to "fill in a client's data" instead of "build new behavior". What stays genuinely one-off stays staff engineering.
Tricky verified setups (conversion / form tracking) → generation alone isn't enough; these need a verify step against the live site, and tracking_status keeps a human approval gate. Automate the generation + the verification check; keep the sign-off.

Open questions¶

ROI metrics still need a client input path (form in the playground?).
Who owns the "invalidate page" action end-to-end — does it re-trigger ingest?
How much config generation can the frontend agent do unattended vs. staff sign-off (e.g. tracking_status already requires human approval)?