You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-04-24-009-pipeline-hea...

8.3 KiB

title type status date
feat: Pipeline health checks and observability feat active 2026-04-24

Pipeline Health Checks and Observability

Overview

There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.

Problem Frame

  • No visibility into whether the last pipeline run succeeded
  • No alerting when motion count drops unexpectedly
  • No detection when SVD components flip or drift
  • No visibility into embedding coverage (% of motions with embeddings)
  • LLM enrichment failures are silent (motions just lack layman_explanation)

Requirements Trace

  • R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
  • R2. Health check detects SVD stability (no sudden axis flips)
  • R3. Health check reports missing layman_explanations
  • R4. Optional: Streamlit page or API endpoint showing health metrics
  • R5. All health checks are testable and tested

Scope Boundaries

Included:

  • Health check module with individual check functions
  • CLI runner for health checks
  • Tests for each check
  • Optional Streamlit health page

Excluded:

  • Real alerting (PagerDuty, Slack) — just script exit codes for now
  • Long-term metrics storage (Prometheus, etc.)
  • Fixing the issues the health check finds

Key Technical Decisions

  • Pure functions for checks — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
  • Composable runner — A runner executes all checks and aggregates results into a report.
  • Exit codes — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.

Context & Research

Relevant Code and Patterns

  • pipeline/run_pipeline.py — orchestrates all pipeline stages
  • database.py — DB queries for motion counts, embeddings, vote counts
  • analysis/svd_labels.py — SVD component stability logic
  • scripts/ — existing diagnostic scripts (drift analysis, etc.)

Institutional Learnings

  • docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md — diagnostic scripts can produce false alarms if they don't verify against canonical DB state
  • docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md — metrics must be derived from canonical sources, not hardcoded

Implementation Units

  • U1. Create health check core module

Goal: Define the check interface and runner.

Requirements: R1–R3 foundation

Dependencies: None

Files:

  • Create: health/__init__.py
  • Create: health/core.py
  • Create: health/checks.py
  • Create: tests/test_health_core.py

Approach:

  • HealthStatus enum: OK, WARNING, CRITICAL
  • HealthCheck dataclass: name, status, message, details
  • run_checks(checks)HealthReport with aggregate status
  • check_* functions are pure: accept data, return HealthCheck

Execution note: Test-first — write test_health_core.py with failing tests for the interface before implementing.

Test scenarios:

  • Happy path: All OK checks → report status OK
  • Error path: One CRITICAL check → report status CRITICAL
  • Edge case: Empty check list → report status OK
  • Integration: Check function signature is pure (no DB access in core)

Verification:

  • uv run pytest tests/test_health_core.py -v passes

  • U2. Implement data freshness checks

Goal: Verify the DB has recent motions and votes.

Requirements: R1

Dependencies: U1

Files:

  • Modify: health/checks.py
  • Create: tests/test_health_checks.py

Approach:

  • check_motion_freshness(db, max_age_days=7) — count motions newer than threshold
  • check_vote_coverage(db) — % of motions with votes
  • check_embedding_coverage(db, min_coverage=0.95) — % of motions with fused embeddings

Execution note: Test-first — use mocked DB or test fixtures with known data.

Test scenarios:

  • Happy path: Recent motions exist, coverage > 95% → OK
  • Warning path: Motions are 10 days old → WARNING
  • Critical path: No motions in last 30 days → CRITICAL
  • Edge case: Empty database → CRITICAL with clear message

Verification:

  • Tests pass with mocked database
  • Manual run against real DB produces accurate report

  • U3. Implement SVD stability check

Goal: Detect if SVD components have flipped or drifted significantly.

Requirements: R2

Dependencies: U1, U2

Files:

  • Modify: health/checks.py
  • Modify: tests/test_health_checks.py

Approach:

  • check_svd_stability(db, reference_themes) — compare current SVD_THEMES to canonical config
  • check_axis_flip(db) — verify right-wing parties are on the right side (reuse existing validation logic)
  • Use analysis/config.py SVD_THEMES as canonical reference

Execution note: Test-first — mock the DB to return known SVD components and test flip detection.

Test scenarios:

  • Happy path: SVD components match canonical themes → OK
  • Warning path: Minor label drift → WARNING
  • Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
  • Edge case: No SVD data in DB → CRITICAL

Verification:

  • Tests pass
  • Manual verification against real DB confirms no false alarms

  • U4. Implement LLM enrichment check

Goal: Surface motions missing layman explanations.

Requirements: R3

Dependencies: U1, U2

Files:

  • Modify: health/checks.py
  • Modify: tests/test_health_checks.py

Approach:

  • check_llm_coverage(db, max_missing=100) — count motions without layman_explanation
  • check_llm_quality(db) — spot-check a sample of explanations for non-empty, reasonable length

Test scenarios:

  • Happy path: <5% missing explanations → OK
  • Warning path: 5–15% missing → WARNING
  • Critical path: >15% missing → CRITICAL
  • Edge case: All explanations are empty strings → WARNING

Verification:

  • Tests pass with mocked data

  • U5. Create CLI runner

Goal: Run all checks from command line with appropriate exit codes.

Requirements: R1–R4

Dependencies: U1–U4

Files:

  • Create: scripts/health_check.py
  • Create: tests/scripts/test_health_check.py

Approach:

  • python scripts/health_check.py → prints report, exits 0/1/2
  • Optional flags: --check motion-freshness, --format json, --threshold-days 7

Test scenarios:

  • Happy path: All OK → exit 0, human-readable output
  • Error path: One warning → exit 1
  • Critical path: One critical → exit 2
  • Edge case: JSON format outputs valid JSON

Verification:

  • uv run python scripts/health_check.py runs without error
  • Exit codes match expectations

  • U6. Add Streamlit health page (optional)

Goal: Visual health dashboard in the app.

Requirements: R4

Dependencies: U1–U5

Files:

  • Create: pages/3_Health.py

Approach:

  • Run all checks on page load
  • Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
  • Use st.metric for key numbers
  • Color-code: green/yellow/red

Test expectation: none — Streamlit page, tested manually.

Verification:

  • Page loads without error
  • Metrics update when DB changes

System-Wide Impact

  • Interaction graph: Health checks read from DB but do not write. Safe to run concurrently with pipeline.
  • Error propagation: Check failures are captured in report, not raised as exceptions.
  • Unchanged invariants: No changes to pipeline, DB schema, or UI behavior.

Risks & Dependencies

Risk Mitigation
False alarms (like trajectories diagnostic) Verify against canonical DB state, not intermediary artifacts
Slow checks on large DB Add query timeouts; cache results
Check drift from codebase changes Health checks are tested; tests fail if logic breaks

Documentation / Operational Notes

  • Add health check to deployment runbook (run before/after pipeline)
  • Consider scheduling in CI or cron

Sources & References

  • docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md
  • analysis/config.py — canonical SVD themes
  • database.py — DB schema and queries
  • docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md