8.3 KiB
| title | type | status | date |
|---|---|---|---|
| feat: Pipeline health checks and observability | feat | active | 2026-04-24 |
Pipeline Health Checks and Observability
Overview
There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.
Problem Frame
- No visibility into whether the last pipeline run succeeded
- No alerting when motion count drops unexpectedly
- No detection when SVD components flip or drift
- No visibility into embedding coverage (% of motions with embeddings)
- LLM enrichment failures are silent (motions just lack layman_explanation)
Requirements Trace
- R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
- R2. Health check detects SVD stability (no sudden axis flips)
- R3. Health check reports missing layman_explanations
- R4. Optional: Streamlit page or API endpoint showing health metrics
- R5. All health checks are testable and tested
Scope Boundaries
Included:
- Health check module with individual check functions
- CLI runner for health checks
- Tests for each check
- Optional Streamlit health page
Excluded:
- Real alerting (PagerDuty, Slack) — just script exit codes for now
- Long-term metrics storage (Prometheus, etc.)
- Fixing the issues the health check finds
Key Technical Decisions
- Pure functions for checks — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
- Composable runner — A runner executes all checks and aggregates results into a report.
- Exit codes — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.
Context & Research
Relevant Code and Patterns
pipeline/run_pipeline.py— orchestrates all pipeline stagesdatabase.py— DB queries for motion counts, embeddings, vote countsanalysis/svd_labels.py— SVD component stability logicscripts/— existing diagnostic scripts (drift analysis, etc.)
Institutional Learnings
docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md— diagnostic scripts can produce false alarms if they don't verify against canonical DB statedocs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md— metrics must be derived from canonical sources, not hardcoded
Implementation Units
- U1. Create health check core module
Goal: Define the check interface and runner.
Requirements: R1–R3 foundation
Dependencies: None
Files:
- Create:
health/__init__.py - Create:
health/core.py - Create:
health/checks.py - Create:
tests/test_health_core.py
Approach:
HealthStatusenum: OK, WARNING, CRITICALHealthCheckdataclass: name, status, message, detailsrun_checks(checks)→HealthReportwith aggregate statuscheck_*functions are pure: accept data, return HealthCheck
Execution note: Test-first — write test_health_core.py with failing tests for the interface before implementing.
Test scenarios:
- Happy path: All OK checks → report status OK
- Error path: One CRITICAL check → report status CRITICAL
- Edge case: Empty check list → report status OK
- Integration: Check function signature is pure (no DB access in core)
Verification:
uv run pytest tests/test_health_core.py -vpasses
- U2. Implement data freshness checks
Goal: Verify the DB has recent motions and votes.
Requirements: R1
Dependencies: U1
Files:
- Modify:
health/checks.py - Create:
tests/test_health_checks.py
Approach:
check_motion_freshness(db, max_age_days=7)— count motions newer than thresholdcheck_vote_coverage(db)— % of motions with votescheck_embedding_coverage(db, min_coverage=0.95)— % of motions with fused embeddings
Execution note: Test-first — use mocked DB or test fixtures with known data.
Test scenarios:
- Happy path: Recent motions exist, coverage > 95% → OK
- Warning path: Motions are 10 days old → WARNING
- Critical path: No motions in last 30 days → CRITICAL
- Edge case: Empty database → CRITICAL with clear message
Verification:
- Tests pass with mocked database
- Manual run against real DB produces accurate report
- U3. Implement SVD stability check
Goal: Detect if SVD components have flipped or drifted significantly.
Requirements: R2
Dependencies: U1, U2
Files:
- Modify:
health/checks.py - Modify:
tests/test_health_checks.py
Approach:
check_svd_stability(db, reference_themes)— compare current SVD_THEMES to canonical configcheck_axis_flip(db)— verify right-wing parties are on the right side (reuse existing validation logic)- Use
analysis/config.pySVD_THEMES as canonical reference
Execution note: Test-first — mock the DB to return known SVD components and test flip detection.
Test scenarios:
- Happy path: SVD components match canonical themes → OK
- Warning path: Minor label drift → WARNING
- Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
- Edge case: No SVD data in DB → CRITICAL
Verification:
- Tests pass
- Manual verification against real DB confirms no false alarms
- U4. Implement LLM enrichment check
Goal: Surface motions missing layman explanations.
Requirements: R3
Dependencies: U1, U2
Files:
- Modify:
health/checks.py - Modify:
tests/test_health_checks.py
Approach:
check_llm_coverage(db, max_missing=100)— count motions without layman_explanationcheck_llm_quality(db)— spot-check a sample of explanations for non-empty, reasonable length
Test scenarios:
- Happy path: <5% missing explanations → OK
- Warning path: 5–15% missing → WARNING
- Critical path: >15% missing → CRITICAL
- Edge case: All explanations are empty strings → WARNING
Verification:
- Tests pass with mocked data
- U5. Create CLI runner
Goal: Run all checks from command line with appropriate exit codes.
Requirements: R1–R4
Dependencies: U1–U4
Files:
- Create:
scripts/health_check.py - Create:
tests/scripts/test_health_check.py
Approach:
python scripts/health_check.py→ prints report, exits 0/1/2- Optional flags:
--check motion-freshness,--format json,--threshold-days 7
Test scenarios:
- Happy path: All OK → exit 0, human-readable output
- Error path: One warning → exit 1
- Critical path: One critical → exit 2
- Edge case: JSON format outputs valid JSON
Verification:
uv run python scripts/health_check.pyruns without error- Exit codes match expectations
- U6. Add Streamlit health page (optional)
Goal: Visual health dashboard in the app.
Requirements: R4
Dependencies: U1–U5
Files:
- Create:
pages/3_Health.py
Approach:
- Run all checks on page load
- Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
- Use
st.metricfor key numbers - Color-code: green/yellow/red
Test expectation: none — Streamlit page, tested manually.
Verification:
- Page loads without error
- Metrics update when DB changes
System-Wide Impact
- Interaction graph: Health checks read from DB but do not write. Safe to run concurrently with pipeline.
- Error propagation: Check failures are captured in report, not raised as exceptions.
- Unchanged invariants: No changes to pipeline, DB schema, or UI behavior.
Risks & Dependencies
| Risk | Mitigation |
|---|---|
| False alarms (like trajectories diagnostic) | Verify against canonical DB state, not intermediary artifacts |
| Slow checks on large DB | Add query timeouts; cache results |
| Check drift from codebase changes | Health checks are tested; tests fail if logic breaks |
Documentation / Operational Notes
- Add health check to deployment runbook (run before/after pipeline)
- Consider scheduling in CI or cron
Sources & References
docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.mdanalysis/config.py— canonical SVD themesdatabase.py— DB schema and queriesdocs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md