--- title: "feat: Pipeline health checks and observability" type: feat status: active date: 2026-04-24 --- # Pipeline Health Checks and Observability ## Overview There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively. ## Problem Frame - No visibility into whether the last pipeline run succeeded - No alerting when motion count drops unexpectedly - No detection when SVD components flip or drift - No visibility into embedding coverage (% of motions with embeddings) - LLM enrichment failures are silent (motions just lack layman_explanation) ## Requirements Trace - R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions - R2. Health check detects SVD stability (no sudden axis flips) - R3. Health check reports missing layman_explanations - R4. Optional: Streamlit page or API endpoint showing health metrics - R5. All health checks are testable and tested ## Scope Boundaries **Included:** - Health check module with individual check functions - CLI runner for health checks - Tests for each check - Optional Streamlit health page **Excluded:** - Real alerting (PagerDuty, Slack) — just script exit codes for now - Long-term metrics storage (Prometheus, etc.) - Fixing the issues the health check finds ## Key Technical Decisions - **Pure functions for checks** — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects. - **Composable runner** — A runner executes all checks and aggregates results into a report. - **Exit codes** — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI. ## Context & Research ### Relevant Code and Patterns - `pipeline/run_pipeline.py` — orchestrates all pipeline stages - `database.py` — DB queries for motion counts, embeddings, vote counts - `analysis/svd_labels.py` — SVD component stability logic - `scripts/` — existing diagnostic scripts (drift analysis, etc.) ### Institutional Learnings - `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md` — diagnostic scripts can produce false alarms if they don't verify against canonical DB state - `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md` — metrics must be derived from canonical sources, not hardcoded ## Implementation Units - [ ] U1. **Create health check core module** **Goal:** Define the check interface and runner. **Requirements:** R1–R3 foundation **Dependencies:** None **Files:** - Create: `health/__init__.py` - Create: `health/core.py` - Create: `health/checks.py` - Create: `tests/test_health_core.py` **Approach:** - `HealthStatus` enum: OK, WARNING, CRITICAL - `HealthCheck` dataclass: name, status, message, details - `run_checks(checks)` → `HealthReport` with aggregate status - `check_*` functions are pure: accept data, return HealthCheck **Execution note:** Test-first — write `test_health_core.py` with failing tests for the interface before implementing. **Test scenarios:** - Happy path: All OK checks → report status OK - Error path: One CRITICAL check → report status CRITICAL - Edge case: Empty check list → report status OK - Integration: Check function signature is pure (no DB access in core) **Verification:** - `uv run pytest tests/test_health_core.py -v` passes --- - [ ] U2. **Implement data freshness checks** **Goal:** Verify the DB has recent motions and votes. **Requirements:** R1 **Dependencies:** U1 **Files:** - Modify: `health/checks.py` - Create: `tests/test_health_checks.py` **Approach:** - `check_motion_freshness(db, max_age_days=7)` — count motions newer than threshold - `check_vote_coverage(db)` — % of motions with votes - `check_embedding_coverage(db, min_coverage=0.95)` — % of motions with fused embeddings **Execution note:** Test-first — use mocked DB or test fixtures with known data. **Test scenarios:** - Happy path: Recent motions exist, coverage > 95% → OK - Warning path: Motions are 10 days old → WARNING - Critical path: No motions in last 30 days → CRITICAL - Edge case: Empty database → CRITICAL with clear message **Verification:** - Tests pass with mocked database - Manual run against real DB produces accurate report --- - [ ] U3. **Implement SVD stability check** **Goal:** Detect if SVD components have flipped or drifted significantly. **Requirements:** R2 **Dependencies:** U1, U2 **Files:** - Modify: `health/checks.py` - Modify: `tests/test_health_checks.py` **Approach:** - `check_svd_stability(db, reference_themes)` — compare current SVD_THEMES to canonical config - `check_axis_flip(db)` — verify right-wing parties are on the right side (reuse existing validation logic) - Use `analysis/config.py` SVD_THEMES as canonical reference **Execution note:** Test-first — mock the DB to return known SVD components and test flip detection. **Test scenarios:** - Happy path: SVD components match canonical themes → OK - Warning path: Minor label drift → WARNING - Critical path: Axis flip detected (right-wing parties on left) → CRITICAL - Edge case: No SVD data in DB → CRITICAL **Verification:** - Tests pass - Manual verification against real DB confirms no false alarms --- - [ ] U4. **Implement LLM enrichment check** **Goal:** Surface motions missing layman explanations. **Requirements:** R3 **Dependencies:** U1, U2 **Files:** - Modify: `health/checks.py` - Modify: `tests/test_health_checks.py` **Approach:** - `check_llm_coverage(db, max_missing=100)` — count motions without layman_explanation - `check_llm_quality(db)` — spot-check a sample of explanations for non-empty, reasonable length **Test scenarios:** - Happy path: <5% missing explanations → OK - Warning path: 5–15% missing → WARNING - Critical path: >15% missing → CRITICAL - Edge case: All explanations are empty strings → WARNING **Verification:** - Tests pass with mocked data --- - [ ] U5. **Create CLI runner** **Goal:** Run all checks from command line with appropriate exit codes. **Requirements:** R1–R4 **Dependencies:** U1–U4 **Files:** - Create: `scripts/health_check.py` - Create: `tests/scripts/test_health_check.py` **Approach:** - `python scripts/health_check.py` → prints report, exits 0/1/2 - Optional flags: `--check motion-freshness`, `--format json`, `--threshold-days 7` **Test scenarios:** - Happy path: All OK → exit 0, human-readable output - Error path: One warning → exit 1 - Critical path: One critical → exit 2 - Edge case: JSON format outputs valid JSON **Verification:** - `uv run python scripts/health_check.py` runs without error - Exit codes match expectations --- - [ ] U6. **Add Streamlit health page (optional)** **Goal:** Visual health dashboard in the app. **Requirements:** R4 **Dependencies:** U1–U5 **Files:** - Create: `pages/3_Health.py` **Approach:** - Run all checks on page load - Display: overall status, motion count, embedding coverage, SVD status, LLM coverage - Use `st.metric` for key numbers - Color-code: green/yellow/red **Test expectation:** none — Streamlit page, tested manually. **Verification:** - Page loads without error - Metrics update when DB changes --- ## System-Wide Impact - **Interaction graph:** Health checks read from DB but do not write. Safe to run concurrently with pipeline. - **Error propagation:** Check failures are captured in report, not raised as exceptions. - **Unchanged invariants:** No changes to pipeline, DB schema, or UI behavior. ## Risks & Dependencies | Risk | Mitigation | |------|------------| | False alarms (like trajectories diagnostic) | Verify against canonical DB state, not intermediary artifacts | | Slow checks on large DB | Add query timeouts; cache results | | Check drift from codebase changes | Health checks are tested; tests fail if logic breaks | ## Documentation / Operational Notes - Add health check to deployment runbook (run before/after pipeline) - Consider scheduling in CI or cron ## Sources & References - `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md` - `analysis/config.py` — canonical SVD themes - `database.py` — DB schema and queries - `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md`