Observability¶

Rose currently uses several observability systems:

Google Cloud Logging for backend application logs
Sentry for error tracking and uptime monitoring (see Sentry Uptime Monitoring)
PostHog for product analytics
Langfuse for LLM tracing and prompt observability
Superlog for an experimental OpenTelemetry test

Sentry Uptime Monitoring¶

Sentry polls the backend deep status probe (GET /status on ixsearch_api) to detect outages in any of the widget's runtime dependencies — MongoDB, Redis, Neo4j, Supabase, Azure OpenAI, OpenAI, Cohere, Langfuse.

How the endpoint works¶

Endpoint	Purpose	Auth	Probes
`GET /health`	Liveness — Cloud Run probe	none	none
`GET /ping`	Ultra-light readiness	none	none
`GET /ready`	Cloud Run readiness — used by load balancer	none	MongoDB ping, Redis ping
`GET /status`	Deep status probe (Sentry uptime)	`X-Status-Token` header	MongoDB `dbStats`, Redis `DBSIZE`, Neo4j `RETURN 1`, Supabase REST query, Azure OpenAI `/openai/models`, OpenAI `/v1/models`, Cohere reach, Langfuse reach

Response shape (200 all-ok, 503 if any critical service is down):

{
  "status": "ok",
  "version": "1.407.0",
  "environment": "production",
  "public_ip": "34.x.x.x",
  "timestamp": "2026-05-26T17:43:06.796311+00:00",
  "services": {
    "mongodb": {"status": "ok", "latency_ms": 157, "detail": "collections=14"},
    "redis": {"status": "ok", "latency_ms": 70, "detail": "keys=64036"},
    "neo4j": {"status": "ok", "latency_ms": 470, "detail": null},
    "supabase": {"status": "ok", "latency_ms": 90, "detail": null},
    "azure_openai": {"status": "ok", "latency_ms": 200, "detail": null},
    "openai": {"status": "ok", "latency_ms": 1366, "detail": null},
    "cohere": {"status": "ok", "latency_ms": 168, "detail": null},
    "langfuse": {"status": "ok", "latency_ms": 135, "detail": null}
  }
}

Critical services (mongodb, redis, neo4j, supabase, azure_openai, openai) flip the global status to down and trigger HTTP 503. Non-critical (cohere, langfuse) stay HTTP 200 but show up as degraded/down in the body — Sentry can alert on partial regressions via body-match rules.

Code: backend/packages/ixweb/ixweb/routes/health.py.

Authentication¶

/status is not public — guarded by a shared-secret header so only Sentry can hit it.

Header	Form
`X-Status-Token: <secret>`	preferred
`Authorization: Bearer <secret>`	accepted (for Sentry compatibility)

Missing or wrong token → HTTP 404 (not 401) to avoid leaking endpoint existence to scrapers. Comparison uses hmac.compare_digest (timing-safe).

The secret is stored in GCP Secret Manager under the name STATUS_PROBE_TOKEN in project inboundx. The backend fetches it via ixinfra.utils.secret_manager.get_secret, which checks env first then falls back to Secret Manager (cached per process via @lru_cache). Cloud Run's default service account already has roles/secretmanager.secretAccessor at project level, so no per-secret IAM binding is required.

Rotating the token¶

gcloud secrets versions add STATUS_PROBE_TOKEN \
  --data-file=<(openssl rand -hex 32) \
  --project=inboundx

The backend caches the token via @lru_cache for the lifetime of each worker process. Restart the Cloud Run service (or trigger a new revision) to pick up the new version. Then update the corresponding Sentry monitor's header value.

Configuring a Sentry Uptime Monitor¶

Read the token out of Secret Manager:

gcloud secrets versions access latest \
  --secret=STATUS_PROBE_TOKEN --project=inboundx

In Sentry: Alerts → Uptime Monitors → Create Monitor.
Fill in:
URL: pick per environment
- production: https://api.userose.ai/status
- staging: https://api-staging.userose.ai/status
- test: https://api-test.userose.ai/status
Method: GET
Interval: 1–5 minutes
Timeout: 10 seconds (each backend probe is capped at 2 s; nine in parallel + network overhead fits comfortably)
Headers:
- Name: X-Status-Token
- Value: paste the token from step 1
Expected status code: 200
Body match (optional): "status":"ok" to alert when the endpoint returns 200 but a non-critical dependency is degraded
Alert routing: wire to the same channel as other backend Sentry alerts.

Local testing¶

The local backend respects the same token. Set it in backend/.env.local:

echo 'STATUS_PROBE_TOKEN=<value-from-secret-manager>' >> backend/.env.local

Restart just dev <environment>, then preview the response in playground at http://localhost:3001/status — paste the same token into the page (stored in localStorage under rose:status_probe_token). Pick the endpoint with the API endpoint selector at the top.

Adding a new probe¶

In backend/packages/ixweb/ixweb/routes/health.py:

Add _probe_<service>() -> ServiceStatus that wraps the check in asyncio.wait_for(..., timeout=STATUS_PROBE_TIMEOUT_S) and returns ServiceStatus(status, latency_ms, detail).
Add the name to the names list and the coroutine to probes inside _run_status_probes.
If the service is widget-critical, add its name to the CRITICAL_SERVICES set so a failure flips HTTP to 503.
Use _http_auth_probe for endpoints requiring a key, _http_reach_probe for unauthenticated reachability.

Superlog OpenTelemetry Test¶

Superlog is currently wired as a test for native OpenTelemetry traces, logs, and metrics. It is not the primary production observability system and should be treated as removable experiment code until the team explicitly decides to keep it.

The current Superlog test sends OTLP/HTTP data to:

https://intake.superlog.sh

The browser uses an inline sl_public_ ingest token. This token is public and write-only, similar to a Sentry DSN or PostHog project token.

What It Does¶

Backend ixsearch_api:

Initializes OpenTelemetry trace, metric, and log providers.
Exports traces, metrics, and logs to Superlog over OTLP/HTTP.
Instruments FastAPI, HTTPX, Requests, and Python logging.
Adds chat route metrics:
chat.queries
chat.query.duration
Adds chat latency and per-LLM-call metrics — see Chat Latency & LLM-Call Metrics.
Adds trace spans around chat query handling.

Client backoffice:

Initializes browser OpenTelemetry providers before PostHog and Sentry.
Exports browser traces, metrics, and logs to Superlog over OTLP/HTTP.
Instruments document load and fetch calls.
Propagates trace headers only to the configured first-party integrations API origin from VITE_INTEGRATIONS_API_BASE_URL.
Registers the browser logger provider with @opentelemetry/api-logs.

Developer tooling:

Adds Superlog and OTel style skills under .agents/skills/.
Links those skills under .claude/skills/.
Updates skills-lock.json.

Chat Latency & LLM-Call Metrics (Grafana Cloud)¶

Native OTel metrics emitted by the Website Agent and fanned out to Grafana Cloud (alongside Superlog). They answer "how fast is Rose, end-to-end and per model call?" — latency lives here; token/cost analytics stay in Langfuse.

Measurement tiers¶

Tier	What	Where measured
A. Whole-turn	request → first/last token, spanning every node (intent → retrieval → answer/redirect/booking)	`ixchat/chatbot.py` (graph stream anchor)
A′. Request-boundary	HTTP-in → response done; includes pre-graph preamble	`ixsearch_api/routes/chat.py`
B. Per-LLM-call	one record per model invocation, by use case (node)	`ixchat/nodes/streaming_utils.py` + `ixllm.metrics.timed_llm_call`

Instrument inventory¶

Names carry the rose_chat_ prefix and no unit suffix — Grafana Cloud's OTLP→Prometheus normalization appends _seconds / _total and the _bucket/_sum/_count series. Explicit second-scale histogram buckets are set via Views in observability.py (default OTel buckets are tuned for counts).

Instrument	Type	Unit	Tier	Notes
`rose_chat_replies_total`	Counter	`1`	A	silent-Rose canary — one per reply leaving the graph, by `outcome` (`success`/`empty`/`error`) and `site`
`rose_chat_inflight`	UpDownCounter	`1`	A	in-flight chat requests (live concurrency), by `site`
`rose_chat_answer_ttft`	Histogram	`s`	A	whole-turn time to first token (streaming only)
`rose_chat_answer_duration`	Histogram	`s`	A	whole-turn end-to-end, request → last token (TTLT; streaming + non-streaming)
`chat.query.duration`	Histogram	`s`	A′	request-boundary end-to-end (both endpoints)
`rose_chat_llm_call_duration`	Histogram	`s`	B	per model call; `_count` = call rate
`rose_chat_llm_call_ttft`	Histogram	`s`	B	per-call first-token latency (streaming calls)
`rose_chat_llm_tokens`	Counter	`tokens`	B	best-effort token usage (see caveat)

Temporality & per-instance series¶

All instruments export with cumulative temporality (the OTel default). Do not switch to delta — Grafana Cloud / Mimir's OTLP gateway rejects delta counters/histograms with HTTP 400 (invalid temporality and type combination), which silently drops every metric batch and leaves dashboards empty.

The search API runs 2–3 Cloud Run instances. Each stamps a unique service.instance.id (observability.py:_resource), so every instance is its own Prometheus series — this, not delta, is what keeps cross-instance counters from collapsing into one bouncing series (which would make rate()/increase() read each dip as a counter reset). Consequence: always aggregate across instances in queries — sum(rose_chat_inflight) for the fleet total, sum(rate(rose_chat_replies_total[5m])), sum by (le, …) (rate(..._bucket[5m])) before histogram_quantile.

Tagging plan¶

Discipline: bounded, low-cardinality only. Never session_id, raw IDs, turn_number, or exception messages. Resource attrs already carry service.name, deployment.environment.name, service.version, vcs.ref.head.revision — do not duplicate env/version on metrics.

Whole-turn (A) attributes — chosen to join with rose_chat_replies_total:

Attribute	Example	Source
`site`	`mayday.fr`	`state["site_name"]` (matches the canary's `site`)
`outcome`	`success` / `empty` / `error`	mirrors `rose_chat_replies_total`
`response_node`	`answer_writer` / `redirect_handler` / `booking_handler`	node that produced the answer
`streaming`	`true` / `false`	streaming vs non-streaming entry point

Per-LLM-call (B) attributes — OTel GenAI semantic conventions:

Attribute	Example	Source
`app.gen_ai.use_case`	`answer_writer`, `intent_classifier`	the node / route use case (`ResolvedChatHandle.use_case`)
`gen_ai.request.model`	`gpt-5.4`, `gpt-4.1-mini`	route policy model
`gen_ai.provider.name`	`openai` / `azure` / `cerebras`	route policy provider
`outcome`	`success` / `error`	call result
`error.type`	`timeout` / `rate_limited` / `upstream_5xx`	only on `outcome=error` (short, bounded)
`token_type`	`input` / `output`	`rose_chat_llm_tokens` only

Dots become _ as Grafana labels (gen_ai_request_model, app_gen_ai_use_case). Per-call metrics are deliberately not tagged with site (avoids site × model × node series blowup) — site-level latency lives in tier A.

Why call-site instrumentation (not a callback handler)¶

A LangChain callback handler is not used for per-call metrics because:

Several aux nodes pass config={"callbacks": []} (a Langfuse Omit-bug dodge) — a graph-config handler would be stripped on exactly those calls.
No-fallback use cases (redirect_handler, booking_handler) stream through a raw client — wrapping it in a proxy risks breaking astream_events.
Langfuse traces via OTEL spans, not the callbacks list.

Instead: astream_accumulate (one chokepoint for all streaming answer calls) takes use_case/request_model/provider kwargs, and structured ainvoke sites are wrapped with ixllm.metrics.timed_llm_call(...).

Token caveat: reliable token usage is unavailable at call sites — with_structured_output(...) returns a parsed object with no usage_metadata, and streamed chunks only carry usage when stream_usage is enabled. rose_chat_llm_tokens is best-effort; full token/cost analytics stay in Langfuse.

Example PromQL (Grafana Explore)¶

# p95 whole-turn TTFT per site
histogram_quantile(0.95, sum by (le, site) (rate(rose_chat_answer_ttft_seconds_bucket[5m])))

# p95 per-call latency by model and use case (node)
histogram_quantile(0.95, sum by (le, gen_ai_request_model, app_gen_ai_use_case)
  (rate(rose_chat_llm_call_duration_seconds_bucket[5m])))

# token throughput by model and type
sum by (gen_ai_request_model, token_type) (rate(rose_chat_llm_tokens_total[5m]))

Code map¶

File	Role
`backend/packages/ixchat/ixchat/metrics.py`	tier-A histograms + `record_answer()`
`backend/packages/ixllm/ixllm/metrics.py`	tier-B instruments + `timed_llm_call`, `record_*`, `extract_usage`
`backend/packages/ixchat/ixchat/nodes/streaming_utils.py`	per-call emit for streaming answer calls
`backend/apps/api/search/ixsearch_api/.../observability.py`	bucket `View`s on the `MeterProvider`
`backend/apps/api/search/ixsearch_api/.../routes/chat.py`	tier-A′ `chat.query.duration` (both endpoints)

How To Remove Superlog¶

Remove the code and dependencies in one PR. Do not remove Langfuse, Sentry, PostHog, or Google Cloud Logging as part of this cleanup; those are separate systems.

Backend¶

Remove the IXSearch API bootstrap:

Delete backend/apps/api/search/ixsearch_api/ixsearch_api/observability.py.
Remove from .observability import init_observability from backend/apps/api/search/ixsearch_api/ixsearch_api/app.py.
Remove the init_observability(app, service_version=API_VERSION) startup call from backend/apps/api/search/ixsearch_api/ixsearch_api/app.py.

Remove route-level OTel usage from backend/apps/api/search/ixsearch_api/ixsearch_api/routes/chat.py:

Remove from opentelemetry import metrics, trace.
Remove from opentelemetry.trace import Status, StatusCode.
Remove the module-scope _tracer, _meter, _chat_queries, and _chat_query_duration declarations.
Remove the @_tracer.start_as_current_span(...) wrapper if no other tracing system replaces it.
Remove _chat_queries.add(...), _chat_query_duration.record(...), span.record_exception(...), and span.set_status(...) calls.

Remove backend dependencies from backend/apps/api/search/ixsearch_api/pyproject.toml:

opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-otlp-proto-http
opentelemetry-instrumentation-fastapi
opentelemetry-instrumentation-httpx
opentelemetry-instrumentation-logging
opentelemetry-instrumentation-requests
opentelemetry-semantic-conventions

Regenerate backend lockfiles after editing dependencies:

cd backend
poetry lock

Frontend¶

Remove the client-backoffice browser bootstrap:

Delete frontend/client-backoffice/src/observability.ts.
Remove import { initObservability } from './observability'; from frontend/client-backoffice/src/main.tsx.
Remove the initObservability(); call from frontend/client-backoffice/src/main.tsx.

Remove client-backoffice dependencies from frontend/client-backoffice/package.json:

@opentelemetry/api
@opentelemetry/api-logs
@opentelemetry/exporter-logs-otlp-http
@opentelemetry/exporter-metrics-otlp-http
@opentelemetry/exporter-trace-otlp-http
@opentelemetry/instrumentation
@opentelemetry/instrumentation-document-load
@opentelemetry/instrumentation-fetch
@opentelemetry/resources
@opentelemetry/sdk-logs
@opentelemetry/sdk-metrics
@opentelemetry/sdk-trace-base
@opentelemetry/sdk-trace-web
@opentelemetry/semantic-conventions

Regenerate frontend lockfiles after editing dependencies:

cd frontend
npm install
cd client-backoffice
npm install --package-lock-only --workspaces=false

Skills¶

Remove the Superlog-specific skills and symlinks if the experiment is fully removed:

.agents/skills/superlog-onboard/
.agents/skills/otel-*-style/
.agents/skills/otel-instrument-feature/
.claude/skills/superlog-onboard
.claude/skills/otel-*-style
.claude/skills/otel-instrument-feature
the matching entries in skills-lock.json

Keep unrelated observability skills that predate Superlog.

Verification After Removal¶

Run the checks that correspond to touched files:

cd backend
just mypy apps/api/search/ixsearch_api/ixsearch_api/app.py apps/api/search/ixsearch_api/ixsearch_api/routes/chat.py

cd frontend
just lint

For a full confidence check, also run:

cd backend
just test

cd frontend
just check

Before merging, search for remaining Superlog references:

rg "Superlog|superlog|intake\.superlog\.sh|sl_public_|initObservability|init_observability"

Expected result after complete removal: only historical docs, changelog entries, or PR notes should remain.