motief/docs/plans/2026-04-24-009-pipeline-hea...

---
title: "feat: Pipeline health checks and observability"
type: feat
status: active
date: 2026-04-24
---

# Pipeline Health Checks and Observability

## Overview

There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.

## Problem Frame

- No visibility into whether the last pipeline run succeeded
- No alerting when motion count drops unexpectedly
- No detection when SVD components flip or drift
- No visibility into embedding coverage (% of motions with embeddings)
- LLM enrichment failures are silent (motions just lack layman_explanation)

## Requirements Trace

- R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
- R2. Health check detects SVD stability (no sudden axis flips)
- R3. Health check reports missing layman_explanations
- R4. Optional: Streamlit page or API endpoint showing health metrics
- R5. All health checks are testable and tested

## Scope Boundaries

**Included:**
- Health check module with individual check functions
- CLI runner for health checks
- Tests for each check
- Optional Streamlit health page

**Excluded:**
- Real alerting (PagerDuty, Slack) — just script exit codes for now
- Long-term metrics storage (Prometheus, etc.)
- Fixing the issues the health check finds

## Key Technical Decisions

- **Pure functions for checks** — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
- **Composable runner** — A runner executes all checks and aggregates results into a report.
- **Exit codes** — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.

## Context & Research

### Relevant Code and Patterns

- `pipeline/run_pipeline.py` — orchestrates all pipeline stages
- `database.py` — DB queries for motion counts, embeddings, vote counts
- `analysis/svd_labels.py` — SVD component stability logic
- `scripts/` — existing diagnostic scripts (drift analysis, etc.)

### Institutional Learnings

- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md` — diagnostic scripts can produce false alarms if they don't verify against canonical DB state
- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md` — metrics must be derived from canonical sources, not hardcoded

## Implementation Units

- [ ] U1. **Create health check core module**

**Goal:** Define the check interface and runner.

**Requirements:** R1–R3 foundation

**Dependencies:** None

**Files:**
- Create: `health/__init__.py`
- Create: `health/core.py`
- Create: `health/checks.py`
- Create: `tests/test_health_core.py`

**Approach:**
- `HealthStatus` enum: OK, WARNING, CRITICAL
- `HealthCheck` dataclass: name, status, message, details
- `run_checks(checks)` → `HealthReport` with aggregate status
- `check_*` functions are pure: accept data, return HealthCheck

**Execution note:** Test-first — write `test_health_core.py` with failing tests for the interface before implementing.

**Test scenarios:**
- Happy path: All OK checks → report status OK
- Error path: One CRITICAL check → report status CRITICAL
- Edge case: Empty check list → report status OK
- Integration: Check function signature is pure (no DB access in core)

**Verification:**
- `uv run pytest tests/test_health_core.py -v` passes

---

- [ ] U2. **Implement data freshness checks**

**Goal:** Verify the DB has recent motions and votes.

**Requirements:** R1

**Dependencies:** U1

**Files:**
- Modify: `health/checks.py`
- Create: `tests/test_health_checks.py`

**Approach:**
- `check_motion_freshness(db, max_age_days=7)` — count motions newer than threshold
- `check_vote_coverage(db)` — % of motions with votes
- `check_embedding_coverage(db, min_coverage=0.95)` — % of motions with fused embeddings

**Execution note:** Test-first — use mocked DB or test fixtures with known data.

**Test scenarios:**
- Happy path: Recent motions exist, coverage > 95% → OK
- Warning path: Motions are 10 days old → WARNING
- Critical path: No motions in last 30 days → CRITICAL
- Edge case: Empty database → CRITICAL with clear message

**Verification:**
- Tests pass with mocked database
- Manual run against real DB produces accurate report

---

- [ ] U3. **Implement SVD stability check**

**Goal:** Detect if SVD components have flipped or drifted significantly.

**Requirements:** R2

**Dependencies:** U1, U2

**Files:**
- Modify: `health/checks.py`
- Modify: `tests/test_health_checks.py`

**Approach:**
- `check_svd_stability(db, reference_themes)` — compare current SVD_THEMES to canonical config
- `check_axis_flip(db)` — verify right-wing parties are on the right side (reuse existing validation logic)
- Use `analysis/config.py` SVD_THEMES as canonical reference

**Execution note:** Test-first — mock the DB to return known SVD components and test flip detection.

**Test scenarios:**
- Happy path: SVD components match canonical themes → OK
- Warning path: Minor label drift → WARNING
- Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
- Edge case: No SVD data in DB → CRITICAL

**Verification:**
- Tests pass
- Manual verification against real DB confirms no false alarms

---

- [ ] U4. **Implement LLM enrichment check**

**Goal:** Surface motions missing layman explanations.

**Requirements:** R3

**Dependencies:** U1, U2

**Files:**
- Modify: `health/checks.py`
- Modify: `tests/test_health_checks.py`

**Approach:**
- `check_llm_coverage(db, max_missing=100)` — count motions without layman_explanation
- `check_llm_quality(db)` — spot-check a sample of explanations for non-empty, reasonable length

**Test scenarios:**
- Happy path: <5% missing explanations → OK
- Warning path: 5–15% missing → WARNING
- Critical path: >15% missing → CRITICAL
- Edge case: All explanations are empty strings → WARNING

**Verification:**
- Tests pass with mocked data

---

- [ ] U5. **Create CLI runner**

**Goal:** Run all checks from command line with appropriate exit codes.

**Requirements:** R1–R4

**Dependencies:** U1–U4

**Files:**
- Create: `scripts/health_check.py`
- Create: `tests/scripts/test_health_check.py`

**Approach:**
- `python scripts/health_check.py` → prints report, exits 0/1/2
- Optional flags: `--check motion-freshness`, `--format json`, `--threshold-days 7`

**Test scenarios:**
- Happy path: All OK → exit 0, human-readable output
- Error path: One warning → exit 1
- Critical path: One critical → exit 2
- Edge case: JSON format outputs valid JSON

**Verification:**
- `uv run python scripts/health_check.py` runs without error
- Exit codes match expectations

---

- [ ] U6. **Add Streamlit health page (optional)**

**Goal:** Visual health dashboard in the app.

**Requirements:** R4

**Dependencies:** U1–U5

**Files:**
- Create: `pages/3_Health.py`

**Approach:**
- Run all checks on page load
- Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
- Use `st.metric` for key numbers
- Color-code: green/yellow/red

**Test expectation:** none — Streamlit page, tested manually.

**Verification:**
- Page loads without error
- Metrics update when DB changes

---

## System-Wide Impact

- **Interaction graph:** Health checks read from DB but do not write. Safe to run concurrently with pipeline.
- **Error propagation:** Check failures are captured in report, not raised as exceptions.
- **Unchanged invariants:** No changes to pipeline, DB schema, or UI behavior.

## Risks & Dependencies

| Risk | Mitigation |
|------|------------|
| False alarms (like trajectories diagnostic) | Verify against canonical DB state, not intermediary artifacts |
| Slow checks on large DB | Add query timeouts; cache results |
| Check drift from codebase changes | Health checks are tested; tests fail if logic breaks |

## Documentation / Operational Notes

- Add health check to deployment runbook (run before/after pipeline)
- Consider scheduling in CI or cron

## Sources & References

- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md`
- `analysis/config.py` — canonical SVD themes
- `database.py` — DB schema and queries
- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md`