Onboarding: Where Human Time Goes, and How to Automate It¶
Internal doc. Audience: Rose team. Purpose: make explicit what costs human time in a client onboarding today, who does each part, and the plan to push that work into agents (a configuration agent, a knowledge agent, a playground, and the onboarding agent that ties them together).
Source of truth for the current manual flow is the
rose-onboard-clientskill and the skills it calls. This doc reframes that flow as a plan and links the existing designs that already cover parts of it:
- Knowledge ingestion automation → ADR: Productized Website Scraping Pipeline
- Skill development + eval loop → Prompt & Skill Workflow, Skill System
- Autonomous agent runtime (worktree, preview, lifecycle) → ADR: Agent Factory Orchestration
TL;DR¶
Claude already does most of the doing — scrape, ingest, generate config, generate skills, run evals — through the onboarding skills. What costs human time now is reviewing: checking Claude's output before it ships, and relaying/reviewing client feedback by proxy. Staff is the reviewer and orchestrator, not the doer.
So the human time is: judging knowledge quality (is the pricing right? are competitors right? did a bad page poison an answer?), validating generated config/skills, the long tail of small per-client tweaks, a few tricky verified setups (conversion tracking, lookup tables, bespoke per-jurisdiction skills), and acting as the middle layer between the client and Claude.
The plan: automate scraping and key-knowledge ingestion (in progress — see the scraping ADR), then move the review loop to the client directly via a playground + a configuration agent callable from the frontend, so the client reviews answers, pricing, competitors, and invalidates bad content itself, instead of staff doing it by proxy.
The biggest time sinks (ranked)¶
Where the hours actually go, most to least:
| Rank | Time sink | Who | Why it's slow |
|---|---|---|---|
| 1 | Conversion / form-tracking setup + diagnosis | Staff | Most fragile step. Live-page browser inspection, pick the right detection strategy (native / iframe / SPA / GTM), confirm rw_client_form_submitted fires. Forms break detection constantly; needs human approval (tracking_status). |
| 2 | Long tail of small per-client tweaks | Staff | Onboarding never ends in one pass — endless little changes (reword a question, fix a CTA, adjust one skill line). Each is a manual round-trip. Collectively the bulk of ongoing time. |
| 3 | Knowledge review (pricing, competitors, bad pages) | Staff judging Claude output | High-judgment: stale prices and wrong competitor scope are the classic failures. A bad source page poisons answers. |
| 4 | Custom per-client features | Staff (real engineering) | One-off features built for a single client — no reuse, not generatable. Each needs design + build + eval. Examples: per-jurisdiction answer routing (iBanFirst), themed/labelled FAQ (Roundtable), custom lookup tables (PayFit CCN, countries, integrations, plan tiers, feature matrix). |
| 5 | Skills + eval prompt-engineering loop | Staff judging Claude output | Write → run rose-eval → read transcripts → fix → re-run. The custom features above are the heaviest cases. |
| 6 | Client feedback relay | Staff ↔ client | Staff is a manual middle layer: client flags wrong answer → staff interprets → points Claude at fix → re-runs → relays back. |
Claude already does the doing in all of these — the time is the human review + relay loop around it, plus the fragile/bespoke setups that need live verification. Details per phase below.
The pieces¶
Onboarding decomposes into four big pieces. Each is at a different stage.
| Piece | What it is | State |
|---|---|---|
| Knowledge agent | Scrape + ingest site/docs into RAG, plus create key knowledge (pricing, competitors) | Partly manual; productization designed in the scraping ADR |
| Cropping | Trim/clean scraped pages so the KB isn't poisoned | Mostly done (per-site cleanup + RAG-friendly checks) |
| Playground | Client-facing surface to run questions and review answers | To build |
| Onboarding agent (+ knowledge agent) | Generate config + skills, run evals, drive client review | Config exists as skills; automate into a frontend-callable agent |
Today vs target (diagram)¶
Red = human time (staff). Green = automated (Claude / pipeline). Blue = client. Today the client only touches the very end, through staff; the target moves the whole review loop to the client.
The red nodes are what we're paying for. The target collapses them: the pipeline absorbs the knowledge work, the config agent absorbs generation + small tweaks, and the client absorbs the review that staff does by proxy today. The only human gate that stays is sign-off on the fragile verified setups.
What takes human time today (and who does it)¶
The current flow, per rose-onboard-client. Claude runs each skill; the human
time is review, not execution. "Staff" = Rose onboarding engineer (reviewer /
orchestrator). "Client" = the customer being onboarded.
Across all phases the staff job is the same shape: kick off the skill, read what Claude produced, judge if it's correct/safe, send it back or sign off.
1. Knowledge ingestion — CLAUDE does it, STAFF reviews (heavy)¶
- Scrape the site (
rose-scrape-website): build a curated URL list (blind--followpoisons the KB with/tag/,/author/, pagination, PR noise), run the scraper, audit quality, write a per-site cleanup script. - Scrape internal docs (
rose-scrape-internal-docs): convert client decks / PDFs / pricing sheets to markdown, strip sensitive info (margins, commissions, employee names, "do not share" notes), invent nothing. - Cropping / cleanup: the per-site cleanup-script work + RAG-friendly checks. Mostly automated now, but still needs a human eye on the audit.
- Key knowledge (pricing, competitors): today these come out of scraping + manual curation. Getting them right is high-judgment: stale prices and wrong competitor scope are the classic failure modes.
Time sink: judging whether the ingested knowledge is correct and safe. The productized pipeline that replaces the ad-hoc local workflow is designed in the scraping ADR (Firecrawl map/crawl + bounded LLM include/exclude decisions + GCS-versioned corpus + the existing document-loader into LightRAG/Neo4j/MongoDB).
2. Config generation — CLAUDE does it, STAFF reviews (medium)¶
rose-create-client-config reads the KB + browses the site and writes the
configs + curated content:
- identity, CTAs, engagement (suggested questions per language), analytics taxonomy, qualification signals, competitors, appearance, curated content.
- Staff judgment: does the generated config match reality? Are languages right? Are CTAs pointing at the real demo form?
Not auto-generated: the 5 ROI metrics (monthly_traffic, monthly_demos,
demos_to_deals_rate, acv, rose_cost). These are client CRM/billing numbers —
staff has to ask the client. rose_cost is billing-only.
3. Skills generation + eval — CLAUDE does it, STAFF reviews (medium/heavy)¶
rose-create-client-skills (see Skill System and the
Prompt & Skill Workflow):
- Compare client needs against all global skills, decide which (if any) skill overrides are needed (pricing behavior, competitor framing, redirects).
- Write the smallest skill that works, then trim.
- Build an eval dataset and run
rose-evalon staging, iterate onrose-chat.
Time sink: prompt-engineering loop — write, run, read transcripts, fix, re-run.
4. Client review — CLIENT, today routed through STAFF¶
Today the client reviews after Claude + staff have done everything, and the feedback comes back through staff: the client says "this answer is wrong", staff interprets it, points Claude at the fix, re-runs, relays the result. Staff is a manual relay between the client and Claude. The client should be validating pricing, competitors, and answer quality directly — but there's no self-serve surface for it yet.
5. Custom per-client features — STAFF (real engineering, biggest hidden cost)¶
The category the "generate config + skills" model doesn't capture at all: one-off features built for a single client. They aren't a config value or a generated skill — they're bespoke behavior that only ever benefits that one client, with no reuse across the fleet. Each one is a real design + build + eval cycle, and they recur as clients ask for things the generic flow can't do.
Examples (the point is the category, not the list):
- Per-jurisdiction answer routing (iBanFirst) — answers must be scoped to the visitor's country (coverage, regulation, availability differ), so a custom skill routes the answer by jurisdiction instead of giving one global reply.
- Themed / labelled FAQ (Roundtable) — FAQ knowledge organized by theme + labels so the agent surfaces the right grouped answer.
- Custom lookup tables — any deterministic coverage list the client supplies: supported CCN / IDCC (e.g. PayFit), countries, integrations, plan tiers, feature matrix, etc. Build the YAML mapping, import the list, smoke-test the lookup renders in the answer-node prompt. PayFit CCN is just one example — the mechanism is generic.
These need real prompt-engineering + eval iteration, not generation, and they're the heaviest single-client time sink.
6. The long tail of small changes — STAFF (underestimated)¶
Onboarding isn't one pass — it's a steady stream of little tweaks per client (reword a question, fix a CTA label, adjust one skill line, nudge a redirect). Individually tiny, collectively the bulk of ongoing human time. Each one is a manual round-trip.
7. Conversion / form-tracking setup + diagnosis — STAFF (most fragile)¶
Set up form/conversion tracking and verify it actually fires on the live site.
The most fragile part of onboarding: rose-form-tracking-diagnosis has to
inspect the live page with a browser, figure out which of several detection
strategies applies (native submit, iframe-embedded forms, SPA, GTM-injected),
and confirm rw_client_form_submitted actually fires. Forms break detection
constantly (new-editor HubSpot forms always iframe, no postMessage; cache
plugins externalize inline scripts; A/B-split redirects change the destination).
High-judgment, easy to get wrong, and tracking_status requires explicit human
approval (never written autonomously). Must be checked against the real page,
not just config.
Where the human time actually concentrates¶
Claude does the doing. The bottleneck is the review/relay loop:
- Staff reviewing Claude's generated knowledge / config / skills before ship.
- Staff relaying client feedback into the next Claude iteration.
- Custom per-client features — real one-off engineering, no reuse.
- The long tail of small per-client tweaks, and the fragile tracking setup.
Automating the doing further (scraping, key knowledge) helps, but the bigger win is removing staff from the review/relay loop — see Future work.
Linear / brief handling — STAFF¶
Read the parent onboarding ticket, extract the brief (domain, languages, CTA, CRM, qualification questions, redirects, tracking pages, ROI numbers), save it for downstream skills. Manual but light.
Future work¶
Already moving: automate scraping + key-knowledge ingestion¶
We're making sure scraping works automatically with the latest changes, and extending that to key knowledge — pricing and competitors — so those are ingested automatically instead of hand-curated. Cropping is mostly done. The target architecture is the Productized Website Scraping Pipeline ADR: an explicit Cloud Run state machine (map → triage → sample → decide → scrape → preflight → cleanup → validate → publish), backoffice-inspectable run state, and GCS-versioned corpora — no autonomous agent loop, and no running arbitrary per-client cleanup code in production.
What remains: put the client in the loop¶
The remaining human time is the review/relay loop — and the judgment in it is something the client is better placed to make than staff. The plan is to cut staff out of the middle and give the client that loop directly:
- Put the client in the loop — stop reviewing knowledge by proxy.
- Generate questions for the client in a playground — auto-produce the questions a buyer would ask (and the ones we're unsure about).
- Run them and let the client review automatically — show the client the agent's answers and let them flag what's wrong, instead of staff reading every transcript.
- Let the client look at pricing and competitors — the two highest-risk knowledge areas, reviewed by the people who actually know them.
- Let the client invalidate pages/content — when an answer is wrong because
of a bad source page, the client can mark that page/content as bad so it
stops feeding answers. (The scraping ADR already has the artifact/tombstone
model —
<!-- nullified -->— to drop removed documents from the index.)
How: a configuration agent callable from the frontend¶
A big part of this is already done as configuration (the configs, curated content, skills). The next step is to wrap it in an agent that does configuration — generating dynamic questions, running them, applying client feedback — and call that agent from the frontend so the client drives it.
Mapping to the four pieces:
- Knowledge agent — automate scrape + ingest + key-knowledge (pricing, competitors) per the scraping ADR; expose "invalidate this page/content" so bad sources can be dropped.
- Cropping — mostly done; keep it automatic.
- Playground — the client-facing surface: generate questions, run them, collect client verdicts, surface pricing/competitor review.
- Onboarding agent (+ knowledge agent) — orchestrates the above. Today's
rose-onboard-client+rose-create-client-config+rose-create-client-skillslogic becomes an agent callable from the frontend, with the client in the loop instead of staff running every step by hand. The Agent Factory ADR covers the runtime/isolation patterns this agent can reuse.
The long tail in the automated world¶
The four-piece plan handles first-pass generation + client review. Two tail items need their own answer:
- Small per-customer changes → the same frontend config agent should take a plain-language ask from the client ("change this question", "fix this CTA") and apply it, so the steady stream of tweaks stops being a staff round-trip.
- Custom per-client features (per-jurisdiction routing, themed FAQ, lookup tables, …) → the hardest to automate, because each is bespoke. Push the repeatable shapes into reusable mechanisms (lookup tables are already one), so "custom feature" shrinks to "fill in a client's data" instead of "build new behavior". What stays genuinely one-off stays staff engineering.
- Tricky verified setups (conversion / form tracking) → generation alone
isn't enough; these need a verify step against the live site, and
tracking_statuskeeps a human approval gate. Automate the generation + the verification check; keep the sign-off.
Open questions¶
- ROI metrics still need a client input path (form in the playground?).
- Who owns the "invalidate page" action end-to-end — does it re-trigger ingest?
- How much config generation can the frontend agent do unattended vs. staff
sign-off (e.g.
tracking_statusalready requires human approval)?