You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
264 lines
8.3 KiB
264 lines
8.3 KiB
---
|
|
title: "feat: Pipeline health checks and observability"
|
|
type: feat
|
|
status: active
|
|
date: 2026-04-24
|
|
---
|
|
|
|
# Pipeline Health Checks and Observability
|
|
|
|
## Overview
|
|
|
|
There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.
|
|
|
|
## Problem Frame
|
|
|
|
- No visibility into whether the last pipeline run succeeded
|
|
- No alerting when motion count drops unexpectedly
|
|
- No detection when SVD components flip or drift
|
|
- No visibility into embedding coverage (% of motions with embeddings)
|
|
- LLM enrichment failures are silent (motions just lack layman_explanation)
|
|
|
|
## Requirements Trace
|
|
|
|
- R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
|
|
- R2. Health check detects SVD stability (no sudden axis flips)
|
|
- R3. Health check reports missing layman_explanations
|
|
- R4. Optional: Streamlit page or API endpoint showing health metrics
|
|
- R5. All health checks are testable and tested
|
|
|
|
## Scope Boundaries
|
|
|
|
**Included:**
|
|
- Health check module with individual check functions
|
|
- CLI runner for health checks
|
|
- Tests for each check
|
|
- Optional Streamlit health page
|
|
|
|
**Excluded:**
|
|
- Real alerting (PagerDuty, Slack) — just script exit codes for now
|
|
- Long-term metrics storage (Prometheus, etc.)
|
|
- Fixing the issues the health check finds
|
|
|
|
## Key Technical Decisions
|
|
|
|
- **Pure functions for checks** — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
|
|
- **Composable runner** — A runner executes all checks and aggregates results into a report.
|
|
- **Exit codes** — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.
|
|
|
|
## Context & Research
|
|
|
|
### Relevant Code and Patterns
|
|
|
|
- `pipeline/run_pipeline.py` — orchestrates all pipeline stages
|
|
- `database.py` — DB queries for motion counts, embeddings, vote counts
|
|
- `analysis/svd_labels.py` — SVD component stability logic
|
|
- `scripts/` — existing diagnostic scripts (drift analysis, etc.)
|
|
|
|
### Institutional Learnings
|
|
|
|
- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md` — diagnostic scripts can produce false alarms if they don't verify against canonical DB state
|
|
- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md` — metrics must be derived from canonical sources, not hardcoded
|
|
|
|
## Implementation Units
|
|
|
|
- [ ] U1. **Create health check core module**
|
|
|
|
**Goal:** Define the check interface and runner.
|
|
|
|
**Requirements:** R1–R3 foundation
|
|
|
|
**Dependencies:** None
|
|
|
|
**Files:**
|
|
- Create: `health/__init__.py`
|
|
- Create: `health/core.py`
|
|
- Create: `health/checks.py`
|
|
- Create: `tests/test_health_core.py`
|
|
|
|
**Approach:**
|
|
- `HealthStatus` enum: OK, WARNING, CRITICAL
|
|
- `HealthCheck` dataclass: name, status, message, details
|
|
- `run_checks(checks)` → `HealthReport` with aggregate status
|
|
- `check_*` functions are pure: accept data, return HealthCheck
|
|
|
|
**Execution note:** Test-first — write `test_health_core.py` with failing tests for the interface before implementing.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: All OK checks → report status OK
|
|
- Error path: One CRITICAL check → report status CRITICAL
|
|
- Edge case: Empty check list → report status OK
|
|
- Integration: Check function signature is pure (no DB access in core)
|
|
|
|
**Verification:**
|
|
- `uv run pytest tests/test_health_core.py -v` passes
|
|
|
|
---
|
|
|
|
- [ ] U2. **Implement data freshness checks**
|
|
|
|
**Goal:** Verify the DB has recent motions and votes.
|
|
|
|
**Requirements:** R1
|
|
|
|
**Dependencies:** U1
|
|
|
|
**Files:**
|
|
- Modify: `health/checks.py`
|
|
- Create: `tests/test_health_checks.py`
|
|
|
|
**Approach:**
|
|
- `check_motion_freshness(db, max_age_days=7)` — count motions newer than threshold
|
|
- `check_vote_coverage(db)` — % of motions with votes
|
|
- `check_embedding_coverage(db, min_coverage=0.95)` — % of motions with fused embeddings
|
|
|
|
**Execution note:** Test-first — use mocked DB or test fixtures with known data.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: Recent motions exist, coverage > 95% → OK
|
|
- Warning path: Motions are 10 days old → WARNING
|
|
- Critical path: No motions in last 30 days → CRITICAL
|
|
- Edge case: Empty database → CRITICAL with clear message
|
|
|
|
**Verification:**
|
|
- Tests pass with mocked database
|
|
- Manual run against real DB produces accurate report
|
|
|
|
---
|
|
|
|
- [ ] U3. **Implement SVD stability check**
|
|
|
|
**Goal:** Detect if SVD components have flipped or drifted significantly.
|
|
|
|
**Requirements:** R2
|
|
|
|
**Dependencies:** U1, U2
|
|
|
|
**Files:**
|
|
- Modify: `health/checks.py`
|
|
- Modify: `tests/test_health_checks.py`
|
|
|
|
**Approach:**
|
|
- `check_svd_stability(db, reference_themes)` — compare current SVD_THEMES to canonical config
|
|
- `check_axis_flip(db)` — verify right-wing parties are on the right side (reuse existing validation logic)
|
|
- Use `analysis/config.py` SVD_THEMES as canonical reference
|
|
|
|
**Execution note:** Test-first — mock the DB to return known SVD components and test flip detection.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: SVD components match canonical themes → OK
|
|
- Warning path: Minor label drift → WARNING
|
|
- Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
|
|
- Edge case: No SVD data in DB → CRITICAL
|
|
|
|
**Verification:**
|
|
- Tests pass
|
|
- Manual verification against real DB confirms no false alarms
|
|
|
|
---
|
|
|
|
- [ ] U4. **Implement LLM enrichment check**
|
|
|
|
**Goal:** Surface motions missing layman explanations.
|
|
|
|
**Requirements:** R3
|
|
|
|
**Dependencies:** U1, U2
|
|
|
|
**Files:**
|
|
- Modify: `health/checks.py`
|
|
- Modify: `tests/test_health_checks.py`
|
|
|
|
**Approach:**
|
|
- `check_llm_coverage(db, max_missing=100)` — count motions without layman_explanation
|
|
- `check_llm_quality(db)` — spot-check a sample of explanations for non-empty, reasonable length
|
|
|
|
**Test scenarios:**
|
|
- Happy path: <5% missing explanations → OK
|
|
- Warning path: 5–15% missing → WARNING
|
|
- Critical path: >15% missing → CRITICAL
|
|
- Edge case: All explanations are empty strings → WARNING
|
|
|
|
**Verification:**
|
|
- Tests pass with mocked data
|
|
|
|
---
|
|
|
|
- [ ] U5. **Create CLI runner**
|
|
|
|
**Goal:** Run all checks from command line with appropriate exit codes.
|
|
|
|
**Requirements:** R1–R4
|
|
|
|
**Dependencies:** U1–U4
|
|
|
|
**Files:**
|
|
- Create: `scripts/health_check.py`
|
|
- Create: `tests/scripts/test_health_check.py`
|
|
|
|
**Approach:**
|
|
- `python scripts/health_check.py` → prints report, exits 0/1/2
|
|
- Optional flags: `--check motion-freshness`, `--format json`, `--threshold-days 7`
|
|
|
|
**Test scenarios:**
|
|
- Happy path: All OK → exit 0, human-readable output
|
|
- Error path: One warning → exit 1
|
|
- Critical path: One critical → exit 2
|
|
- Edge case: JSON format outputs valid JSON
|
|
|
|
**Verification:**
|
|
- `uv run python scripts/health_check.py` runs without error
|
|
- Exit codes match expectations
|
|
|
|
---
|
|
|
|
- [ ] U6. **Add Streamlit health page (optional)**
|
|
|
|
**Goal:** Visual health dashboard in the app.
|
|
|
|
**Requirements:** R4
|
|
|
|
**Dependencies:** U1–U5
|
|
|
|
**Files:**
|
|
- Create: `pages/3_Health.py`
|
|
|
|
**Approach:**
|
|
- Run all checks on page load
|
|
- Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
|
|
- Use `st.metric` for key numbers
|
|
- Color-code: green/yellow/red
|
|
|
|
**Test expectation:** none — Streamlit page, tested manually.
|
|
|
|
**Verification:**
|
|
- Page loads without error
|
|
- Metrics update when DB changes
|
|
|
|
---
|
|
|
|
## System-Wide Impact
|
|
|
|
- **Interaction graph:** Health checks read from DB but do not write. Safe to run concurrently with pipeline.
|
|
- **Error propagation:** Check failures are captured in report, not raised as exceptions.
|
|
- **Unchanged invariants:** No changes to pipeline, DB schema, or UI behavior.
|
|
|
|
## Risks & Dependencies
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| False alarms (like trajectories diagnostic) | Verify against canonical DB state, not intermediary artifacts |
|
|
| Slow checks on large DB | Add query timeouts; cache results |
|
|
| Check drift from codebase changes | Health checks are tested; tests fail if logic breaks |
|
|
|
|
## Documentation / Operational Notes
|
|
|
|
- Add health check to deployment runbook (run before/after pipeline)
|
|
- Consider scheduling in CI or cron
|
|
|
|
## Sources & References
|
|
|
|
- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md`
|
|
- `analysis/config.py` — canonical SVD themes
|
|
- `database.py` — DB schema and queries
|
|
- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md`
|
|
|