Measuring Rose Performance¶
A practical guide to evaluating Rose's impact on your site — what we already know, what you can measure on your own traffic, and how to interpret each method honestly.
TL;DR¶
Rose works. Across our customer base we have multiple converging proof points:
- Live engagement data on every deployment shows visitors interacting with Rose, asking questions, and converting inside conversations (qualified leads, captured emails, demos booked).
- Engaged-vs-non-engaged conversion gaps measured per client (your account manager can pull yours) consistently show Rose-engaged visitors converting at substantially higher rates than non-engaged visitors.
- Several client A/B tests, run rigorously, have shown lifts up to ~20% on mid-funnel conversion. We can share methodology details.
- A cross-portfolio analysis is in flight that will pool results across all Rose deployments and publish a credible interval for the typical lift.
Rose performance is best understood through a portfolio of reads at different rigor levels and timelines, not a single number. This page lists every method available, with two goals: make the existing evidence easy to find, and explain honestly which methods are right for your team and your traffic.
Rose's own A/B test framework is in development
Rose is building an integrated A/B test framework with two parts: (1) per-customer experiments with sample-size calculators, randomised page-load assignment, and Wilson confidence intervals on the results, and (2) a cross-portfolio analysis that pools results across all Rose deployments. Both are in active development and not yet exposed to customers.
A/B testing is one of several methods on this page, not the only one. You can also rely on the live engagement reads (#1, #2), the in-flight portfolio reference (#7), or run an A/B test today using your own platform (Optimizely, AB Tasty, GrowthBook, Statsig, VWO, or similar) — the methodology is the same regardless of which tool splits the traffic. Talk to your account manager; we have set up A/B tests with several clients already.
What We Already Know About Rose's Impact¶
Before listing the methods you can run on your own traffic, here is what the existing evidence already shows. None of this requires you to run a test — it is available today.
| Evidence | What it shows | How to get it |
|---|---|---|
| Engagement dashboard (live on every deployment) | Visitors are seeing Rose, asking real questions, completing in-chat conversions (emails captured, demos booked, qualified leads forwarded). | Rose backoffice → Analytics. See #1. |
| Engaged-vs-non-engaged conversion comparison (per client) | Rose-engaged visitors convert at substantially higher rates than non-engaged visitors on the same site. Correlational, not yet causal, but the direction is consistent across clients. | Ask your account manager. See #2. |
| Client A/B test results (lifts up to ~+20% on mid-funnel conversion) | Several clients have run rigorous A/B tests and measured real lifts. Methodology and anonymised results available on request. | Ask your account manager. See #7. |
| Cross-client portfolio analysis (in flight) | A pooled estimate of typical Rose lift across all deployments, with a credible interval. Will be published quarterly once available. | Coming. See #7. |
If you want to verify these on your own traffic, the methods below give you the tools. If you want to confirm Rose is performing for your account specifically before designing a deeper test, request the engaged-vs-non-engaged comparison from your account manager — it is usually the fastest meaningful read.
Why "Prove Rose Works" Takes Different Forms¶
The proof points above are already strong. The reason this page lists eight methods rather than handing you a single "+X% lift" number is that a fully rigorous, isolated, single-client A/B test on bottom-funnel conversion is expensive in time at typical SaaS traffic — for any vendor, any tool, on any conversion rate platform. Three properties of B2B SaaS traffic drive this:
- Low baseline conversion rates. Free signup or demo-request conversion is typically 1–5% of visitors. The lower the baseline, the more visitors you need to detect a lift.
- Modest absolute traffic. Most SaaS sites have a few hundred to a few thousand qualified visitors per day. Splitting that traffic 50/50 for a test halves your speed.
- Small relative lifts. A realistic conversational AI lift on a bottom-funnel metric is in the 3–10% range. A 10% lift is already a strong result. Detecting a 5% relative lift on a 2% baseline takes roughly 300,000 visitors per arm to be statistically conclusive.
Concretely: a site with 600 daily Rose-eligible visitors and a 2% signup conversion rate needs roughly 2.5 to 3 years of testing to prove a 5% relative lift on signup conversion with classical methods. Detecting a 10% lift on the same site still takes ~9 months. This is not a Rose limitation — it is a property of the math, and it applies identically to any vendor.
The right response is to layer evidence: use the live engagement data and the engagement-vs-non-engagement comparison to read direction quickly, run mid-funnel A/B tests on metrics with high enough baselines to converge in weeks, and use cross-client and portfolio results for context. Each method below explains what it can and cannot claim.
Picking the Right Conversion to Measure¶
B2B SaaS sites typically have several conversion actions at different funnel depths. Rose can affect any of them, but how fast you can detect that effect depends entirely on the baseline rate of the action you measure. A 5% lift on a 2% baseline takes years to prove; the same 5% lift on a 30% baseline takes weeks.
We strongly recommend picking a primary metric with the highest baseline that still maps to revenue for your business. Use lower-baseline metrics only as secondary reads.
Metric families and what they cost to measure¶
Rose treats every business-relevant action you have configured (demo request, contact form, pricing inquiry, trial start, free signup, custom CRM event) as a conversion — one tracked event in the same place. You do not have to pick a single funnel stage; you can layer several. What matters for measurement speed is the baseline rate of whichever conversion(s) you focus on.
| Metric family | What it includes | Typical baseline range | Recommended method | Order-of-magnitude time at 600 daily visitors |
|---|---|---|---|---|
| Attention | Time on page, scroll depth (continuous metrics) | n/a (continuous) | #4 Time-on-page | 2–4 weeks |
| Mid-funnel intent | Pricing page reached, multi-page session, return visit within 7 days, high-value page visit | 10–50% | #5 Mid-funnel intent | 2–6 weeks |
| A tracked conversion | Demo request, contact form, pricing form, trial start, qualified-lead webhook, any other conversion event Rose tracks for you | 1–8% (depends which one) | #6 High-baseline conversion | 1–6 months |
| Free signup (specifically) | Free account created | 1–3% | #8 Full signup-rate | 9 months to 3 years |
| Paid customer (revenue) | Subscription started, deal closed | 0.1–1% | Not testable per-client at typical SaaS traffic | n/a |
Numbers are illustrative and depend heavily on your site's traffic mix and category. Use them to choose, not to forecast.
The "A tracked conversion" row is deliberately broad: in Rose, demo requests, form submissions, trial starts and the like are all the same kind of event. Pick whichever subset matters for your business — or combine them into a composite "any conversion in session" metric, which pushes the baseline higher and the timeline shorter.
How to choose your primary metric¶
- List the conversion actions you actually care about. Demo, signup, contact form, trial start, pricing inquiry, etc.
- For each, estimate the baseline rate (conversion / visitor) from your last 30 days of analytics.
- Pick the action with the highest baseline that still maps to revenue. Demo requests usually win for sales-led SaaS. Trial starts usually win for product-led SaaS. Free signup is rarely the right primary because the baseline is too low.
- Move lower-baseline actions to secondary metrics, reported alongside but not used to gate the verdict.
A common mistake is picking the action closest to revenue (paid customer, free signup) as the primary metric, then concluding "we can't measure Rose" when the timeline turns out to be years. The fix is to measure higher up the funnel, where Rose's effect propagates first and where detection is fast.
What if Rose changes which conversions visitors take?¶
Rose can shift visitors between conversion paths — e.g. a visitor who would have filled the demo form now books directly via the chat, or a visitor who would have filled the pricing form now starts a trial instead. Measuring a single conversion in isolation can miss this re-routing.
Mitigations:
- Track a composite metric: "any conversion within the session" (demo OR contact OR pricing-form OR trial). Higher baseline, robust to re-routing.
- Report secondary conversions alongside the primary: even if the primary moves, the secondaries show whether visitors went elsewhere.
- Layer in #2 (engagement-to-conversion funnel) to see Rose-engaged conversion patterns separately from non-engaged.
If you suspect Rose is re-routing conversions, talk to your account manager — we can set up a composite primary metric in the experiment configuration.
The Evidence Tiering¶
Rose performance can be measured at multiple rigor levels. Each level answers a slightly different question. We recommend layering several — not picking one.
| Method | Timeline | Rigor | What it proves |
|---|---|---|---|
| Engagement dashboard | Live | Low | Visitors are interacting with Rose |
| Engagement-to-conversion funnel | Live | Medium | Rose-engaged visitors convert better |
| Pre / post comparison with peer control | Needs ≥90 days each side | Not recommended for proof | Listed for completeness; results can be biased in either direction by confounders |
| Time-on-page A/B test | 2–4 weeks | High | Rose holds visitor attention longer |
| Mid-funnel intent A/B test | 3–6 weeks | High | Rose moves visitors deeper into the funnel (pricing page reached, return visits, multi-page sessions) |
| Conversion A/B test on a high-baseline action | 4–8 weeks | High | Rose lifts demo requests, pricing-page actions, or another mid-funnel metric |
| Other-client A/B test results | Available today; portfolio analysis in flight | High | Independent A/B tests run with prior clients have shown lifts up to ~20%; portfolio-wide analysis coming |
| Full signup-rate A/B test | 9 months to 3 years | Highest | Rose lifts your free signup conversion specifically |
We walk through each below.
1. Engagement Dashboard¶
What it measures: raw activity — impressions, messages sent, conversations completed, CTA clicks, qualified leads captured.
Where to find it: Rose backoffice → Analytics in the left navigation. Each tab covers a different slice:
- Conversations — widget displayed, conversation started, qualified leads, captured emails, demos booked, plus a small funnel chart.
- Questions — what visitors actually ask the agent.
- Devices, Pages, Topics — segmentation views.
Date range and environment selectors are at the top. Available immediately after install, no setup required.
What it proves: Rose is being seen, used, and producing structured outputs (qualified leads, captured emails, booked demos). It does not prove counterfactual lift — i.e. it cannot tell you whether the same outcomes would have happened without Rose.
When to use it: as your baseline operational view. Useful for sanity-checking that the integration is healthy and that your visitors are engaging with the agent.
Honest framing for stakeholders:
"Rose held X conversations and captured Y qualified leads last month. These are real engaged interactions and real captured prospects. The methods below let us go further and isolate how many of these were incremental to our normal funnel."
2. Engagement-to-Conversion Funnel¶
What it measures: the conversion rate of visitors who engaged with Rose versus visitors on your site overall, broken down by funnel stage. This is a useful leading indicator — engaged visitors should convert more often than non-engaged visitors, and tracking that gap over time shows whether Rose is reaching the right people.
How to get it:
- Fastest path — ask your account manager. Your account manager typically already has the comparison between conversion rates with and without a Rose interaction, computed from your forwarded events plus your own conversion data. Request it whenever you need a current read.
- Partial in dashboard: Rose backoffice → Analytics → Conversations tab shows the Rose-side funnel (widget displayed → conversation started → email captured / demo booked). This covers conversions that happen inside the Rose conversation, but not the full comparison against your downstream conversions (signups, trials, CRM-tracked deals).
- Self-serve: if you want to compute it yourself, forward Rose engagement events into your CRM or analytics tool (see Event Forwarding and Webhooks), join with your downstream conversion data, and compare engaged vs non-engaged conversion rates.
We are planning to surface the full uplift comparison natively in the dashboard, alongside the Rose A/B test framework noted above.
Important caveat: the gap between engaged and non-engaged conversion rates is correlational, not causal. Engaged visitors are self-selected — they are more interested to begin with — so they would convert at a higher rate even without Rose. Use the gap as a leading indicator of intent, not as proof that Rose lifts conversion. The A/B tests below (#4, #5, #6, #8) are the only methods that isolate the causal effect.
When to use it: to validate that engaged conversations have downstream value (they should). To identify which intent signals predict conversion. To inform qualification rules. To set realistic expectations before designing an A/B test.
Honest framing for stakeholders:
"Visitors who engage with Rose convert at X% vs Y% across the rest of our traffic — a meaningful gap that is consistent with results other Rose clients are seeing. Part of this gap is self-selection (engaged visitors were higher intent to begin with) and part is Rose's effect on those visitors. The A/B tests below are the methods that separate the two."
3. Pre / Post Comparison With Peer Control¶
We do not recommend this method as proof of Rose's effect.
We list it because it is widely used in the industry and you will see it in other vendors' case studies. The method is fundamentally unreliable for causal claims, in either direction. Read this section to understand why, and to recognise when others are using it.
What it measures: how your conversion metrics changed in the period after Rose was enabled versus the equivalent period before, optionally corrected for industry trend using a benchmark of similar SaaS companies.
Why we don't trust it:
- Confounders dominate. Anything else that changed during the window — a pricing update, a paid-campaign launch or pause, an SEO change, a product release, seasonality, a competitor move, a refreshed landing page — is silently rolled into the "Rose effect". The method has no way to separate them.
- The story can be told either way. A pre/post analysis that finds a positive shift can almost always be matched by a different framing that finds a negative shift. Pick a different window, a different peer cohort, or a different metric definition and you get a different number. This makes the method easy to abuse in marketing and impossible to defend under scrutiny.
- Noise dominates on low-baseline metrics. Week-to-week conversion variance on a 2% baseline is typically ±10% relative or more. A real 5% Rose lift sits inside the noise band and is not separable from it without a randomised test.
- Peer control is a partial fix at best. Peer benchmarks have their own noise and their own confounders. Subtracting noisy from noisy does not produce signal.
What it can still be useful for:
- A sanity check that Rose did not catastrophically harm a top-of-funnel metric (large negative effects are easier to spot than modest positive ones).
- A directional sign-check on high-baseline intermediate metrics (engagement, time-on-site) where noise is small relative to plausible effects.
- Detecting confounders themselves — e.g. noticing your paid spend doubled in the post-Rose window, which would have invalidated the comparison anyway.
Required window if you do run it: at least 90 days before and 90 days after Rose was enabled, with no major site or marketing changes in either window. Confirm window cleanliness with your marketing team before running.
Method (technical): difference-in-differences. Compare the change in your metric (post-Rose minus pre-Rose) to the change in a peer cohort of similar SaaS sites over the same window. Residual after subtraction is attributed to Rose, with the heavy caveats above.
Honest framing for stakeholders:
"Pre/post analysis is suggestive at best. Our signup rate moved X% in the 90 days after Rose, peer cohort moved Y%, residual is Z%. But marketing also launched a new paid campaign in the same window, so we can't attribute the residual cleanly to Rose. We're treating this as a sanity check, not as proof. The randomised tests below are the only methods that can give a defensible answer."
4. Time-on-Page A/B Test¶
What it measures: does Rose increase the average time visitors spend on the page, compared to a control arm with no Rose?
Why we use this instead of bounce rate: Rose does not block or slow your page load — it is loaded last and asynchronously, so page rendering is not affected. However, Rose itself needs a short bootstrap window before it is visible to the visitor. A visitor who leaves the page in that window was technically "exposed" to the Rose arm but never actually saw Rose, which biases a naive bounce-rate comparison. Time-on-page is a continuous metric, has higher statistical power than binary bounce/no-bounce, and is not affected by the bootstrap window when measured over the full session.
Bounce, done right: the experiment dashboard also reports a Bounce rate (after Rose seen) — the single-page rate computed only over sessions where Rose actually appeared (a widget impression fired), so the bootstrap-window bias above is removed by construction rather than estimated away. Alongside it a per-variant time to first Rose gauge shows how large that bootstrap window is; when it differs markedly between the two arms the dashboard flags the bounce comparison as harder to read.
Why it converges fast: continuous metrics carry more information per visitor than binary ones. Detecting a 5% relative lift in time-on-page typically takes 5–10× fewer visitors than detecting a 5% relative lift in a binary conversion rate. At typical SaaS traffic this means 2–4 weeks rather than 9 months.
Setup: randomised 50/50 split at page load. One arm sees Rose, one arm does not. Compare median (or trimmed mean) time-on-page between arms. Use robust statistics — time-on-page is heavy-tailed.
What it proves: Rose holds visitor attention longer on the pages it triggers on. A real causal effect on engagement depth.
Limitations: attention is not revenue. Longer time on page is a leading indicator, not a conversion. Pair with #5 or #6 for the conversion claim.
When to use it: as the first rigorous causal read in a structured measurement plan. Most clients can complete this in 2–4 weeks and use the result as evidence of an upstream Rose effect while higher-rigor tests run in parallel.
5. Mid-Funnel Intent A/B Test¶
What it measures: does Rose move more visitors past a mid-funnel intent threshold — not just attention (covered in #4), and not yet a hard conversion (covered in #6)?
Why it complements time-on-page: time-on-page measures whether Rose holds attention. This test measures whether that attention turns into directed behaviour — visiting a page that matters, coming back, or going deeper into the site.
Candidate intent metrics (pick one as primary, based on your funnel):
- Pricing page reached in the session. Baseline typically 10–25% on B2B SaaS sites. Strong correlation with downstream conversion. Built into the experiment dashboard: tell Rose which URLs count as your pricing page (Analytics settings → Intent pages, e.g.
/pricing,/pricing/*,/tarifs) and the pricing-page-reached rate is reported per arm automatically, with a readiness clock and a duration forecast at setup. - Return-visit rate within 7 days. Baseline typically 15–30%. Captures Rose's effect on memorability, not just same-session engagement.
- Multi-page session rate (≥2 pages viewed). Baseline typically 30–50%. Easiest to detect but weakest causal story.
- Specific high-value page visited (integrations, customer page, comparison page). Choose based on where your buyers research.
Setup: 50/50 split at page load. Both arms eligible for Rose triggering logic, only one arm sees Rose. Compare the chosen intent metric between arms. Primary metric defined and locked before the test starts.
What it proves: Rose moves visitors into the part of your funnel that historically correlates with conversion. Cleaner causal story than the engagement-to-conversion correlation in method #2.
How it differs from #6: #6 tests a hard conversion (demo request, form submit) at the bottom of the funnel. #5 tests an intent signal upstream of that. #5 reaches significance faster because intent baselines are higher than conversion baselines.
When to use it: when you want to demonstrate Rose's effect on prospect quality and intent, not just engagement, and you have an intent metric in your analytics that maps to revenue downstream. Typically the second rigorous read after time-on-page.
6. Conversion A/B Test on a High-Baseline Action¶
What it measures: does Rose lift a real conversion action that has a high enough baseline to be detectable — typically demo requests, pricing-page form submissions, contact-form submissions, or trial starts.
Why it works: these actions usually have baselines in the 5–15% range, an order of magnitude higher than free-signup conversion on most SaaS sites. Detectable lifts in 4–8 weeks at typical traffic.
Setup: 50/50 split at page load, primary metric defined and locked before the test starts. Sample size and test duration calculated upfront. No mid-test changes to the metric or the cutoff.
What it proves: Rose lifts the chosen mid-funnel conversion action with statistical confidence. This is the strongest single-client causal claim achievable in a reasonable timeline.
Choosing the action: pick the highest-baseline action that maps to revenue. For most B2B SaaS, demo request is a good default. For PLG companies, "trial started" (not "account created"). Avoid free signup as the primary metric unless your traffic and conversion rate are both high.
Cost: the holdout arm forgoes whatever lift Rose provides during the test. Worth calculating before starting. The Rose A/B test framework (in development) will surface a "demos at risk" estimate alongside the duration; in the meantime your account manager can compute it for you.
When to use it: as the centrepiece of a rigorous client-level proof, typically run in parallel with #4 and #5 and after the directional read from #3.
7. Other-Client A/B Test Results and Rose Portfolio Analysis¶
This is not a test you run on your own site — it is context from other Rose deployments that you can use as a prior for what to expect.
Existing client A/B tests¶
Several Rose clients have run independent A/B tests on their own sites (or have asked Rose to set them up). Multiple have shown positive lifts, with the strongest at ~+20% on mid-funnel conversion in a clean comparison. We can share methodology details and anonymised results on request — talk to your account manager.
These are individual studies, not a guarantee that your site will see the same lift. They are useful as a prior: "Rose has been measured at lifts up to +20% across several rigorous client tests" is a defensible statement, "Rose lifts your signups by 20%" is not.
Rose portfolio analysis (in flight)¶
Rose is preparing a cross-client portfolio analysis as part of the broader A/B test framework noted above. It will pool conversion data across all Rose deployments using a hierarchical model that separates per-client noise from systematic effects. The output will be a distribution of lifts across clients — e.g. "median +X%, Y% of clients positive, 95% credible interval [low, high]" — that any new client can use as context.
This is in active development and will be published quarterly once available. It will give you a defensible reference point for what a typical Rose deployment looks like, without requiring you to run a multi-month test on your own site.
When this matters for you: ask your account manager to share existing client A/B results now, and to flag you when the portfolio analysis is published. Use them as prior context alongside whatever measurement you run on your own site.
8. Full Signup-Rate A/B Test¶
This is the same kind of randomised A/B test as methods #4, #5 and #6 — same 50/50 split at page load, same statistical method, same testing tool. The only difference is the metric: free signup completion, instead of an intermediate metric. The setup is mechanically identical; the timeline is much longer because the baseline is much lower.
What it measures: Rose's effect on your free signup conversion rate, with full statistical rigor and no shortcuts.
Why it takes so long: see the math primer above. Required sample size scales roughly as 1 / (baseline × lift²). At a 2% baseline and 600 visitors per day:
- Detecting a 10% relative lift needs ~80,000 visitors per arm — roughly 9 months.
- Detecting a more realistic 5% relative lift needs ~300,000 visitors per arm — roughly 2.5 to 3 years.
Setup: 50/50 split at page load, locked metric (free signup completion per visitor), locked sample size, no peeking. Run to completion. Read the result once.
What it proves: Rose lifts your free signup conversion with the highest possible confidence.
Cost: the holdout arm forgoes the lift over the full test duration. Concretely: if Rose really lifts free signup by 5% on a 600/day site with a 2% baseline, the holdout costs roughly 300 signups over the ~3-year test window.
When to use it: when free signup is the primary metric your business cares about AND the cost of the holdout is acceptable to you AND no faster read addresses the same question. In practice this is rare. We recommend #6 instead for almost all SaaS clients.
Decision Framework¶
Pick the highest-rigor method that fits your decision timeline.
Method #3 (pre/post) is not on this diagram because we do not recommend it for proof. See method #3.
How Rose Compares to Industry Practice¶
Most conversational AI and revenue-tooling vendors do not run rigorous causal tests. The norm in this market is:
- Aggregate testimonials — "84% of customers report improved conversion".
- Influenced pipeline attribution — any deal touched by the tool gets credited.
- Forrester / IDC Total Economic Impact studies — modeled three-year NPV from 3–5 customer interviews.
- Single-client case studies — pre/post numbers with no control group, no peer benchmark.
These are persuasive but not rigorous. Rose offers the lower-rigor versions of these for parity with the market, and the higher-rigor methods (#3 through #7) for customers who want to measure rather than be marketed to. We disclose the rigor level of every number we present.
If another vendor offers you a "proven 35% lift" figure with no methodology link, ask them:
- How many visitors per arm? Over how many days?
- What was the primary metric, fixed before the test?
- Was the assignment randomised at page load or after engagement?
- Was there a single planned analysis, or did the test run until significance appeared?
The answers are rarely satisfying.
Stakeholder Argumentary¶
Common pushbacks and honest responses. Each leads with what we already know, then explains what additional rigor costs.
"We want proof Rose lifts our signups before committing."¶
We already have three converging signals: (a) engagement data showing your visitors are interacting with Rose and converting in-chat, (b) the engaged-vs-non-engaged conversion gap your account manager can pull, and (c) several other client A/B tests showing lifts up to ~20% on mid-funnel metrics. To prove a realistic 5% lift specifically on your signup conversion with full statistical rigor takes roughly 3 years at typical SaaS traffic — same for any vendor — and we will not pretend otherwise. We can prove direction in weeks on intermediate metrics (time-on-page, mid-funnel intent, demo requests) and combine that with the existing evidence. Which timeline matches your decision?
"Our A/B tool says Rose is hurting conversion."¶
Most off-the-shelf A/B tools declare winners on very small samples and do not gate verdicts behind minimum sample sizes or symmetric exposure. Before drawing a conclusion, three checks: (1) was assignment randomised at page load or post-trigger? (2) was the primary metric fixed before the test? (3) has the test reached its pre-planned sample size? If any answer is no, the verdict is not reliable. We can re-run with proper guardrails.
"Our competitors guarantee +X% lift."¶
Ask them for the methodology that produced that number. In every case we've seen, it traces to a customer survey, an influenced-pipeline calculation, or a pre/post case study with no peer control — not a randomised test. We will match or beat any rigorous methodology you can point us to, and we will tell you when something cannot be proved.
"Competitor X showed a +30% lift in a pre/post case study."¶
Pre/post comparisons cannot isolate the tool from everything else that happened in the same window — pricing changes, paid campaigns, product launches, seasonality. The same data can usually be reframed to show the opposite result. We list pre/post in our methodology page (#3) but explicitly do not recommend it as proof. Compare on randomised tests, not pre/post.
"What if Rose isn't doing anything?"¶
The engagement data on your account already says otherwise — visitors are using it, asking questions, converting in-chat — but that is a fair question about incremental lift. The answer comes fast: engagement reads update live, time-on-page and mid-funnel intent A/B tests resolve in two to six weeks. If three sequential reads come back null, that is real signal and you should cancel. The risk of waiting a few weeks for the first signal is small; the risk of acting on a Hublead-style 99% on 400 visitors is large.
"We can't justify a 50% holdout."¶
Lower control split (e.g. 90/10) is supported but slows the test. We typically recommend 50/50 for the duration of one rigorous test, then 100% Rose afterward with periodic 5–10% holdouts for ongoing validation.
"How long do we run before deciding?"¶
The sample size is fixed before the test starts based on your baseline, traffic, and the smallest lift you care about. Your A/B testing tool (or our account team) can compute it; Rose's own framework will surface it natively when it ships. Read the result once at the end, not daily.
What Good Proof Looks Like¶
A trustworthy lift claim has all of the following:
- A single primary metric, defined before the test started.
- A pre-planned sample size, calculated from baseline rate, traffic, and minimum detectable effect.
- Random assignment at page load, not after engagement.
- No mid-test changes to the metric, the cutoff, or the audience.
- A confidence or credible interval on the effect size, not just a "winner".
- An effect size in absolute and relative terms, not just a p-value or probability.
A Rose proof always includes all six. Verdicts that omit any of them are flagged as "directional" in our reporting, not "conclusive".
Getting Started¶
- Decide what decision you are trying to make. ("Should we keep Rose?" "Should we expand to other pages?")
- Decide your timeline.
- Pick the highest-rigor method in the tiering table that fits.
- Contact your Rose account manager to scope the measurement plan.
- Layer in #1 (engagement dashboard) and #2 (engagement-to-conversion funnel) in parallel — both are ready immediately. Avoid relying on #3 (pre/post) as proof; treat it as a sanity check only.
Your account manager can also prepare a written measurement proposal listing every method, its timeline, its cost (in holdout terms), and the question it answers, tailored to your site's traffic and baseline.