8.3 KiB

Raw Blame History

title	type	status	date
feat: Pipeline health checks and observability	feat	active	2026-04-24

Pipeline Health Checks and Observability

Overview

There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.

Problem Frame

No visibility into whether the last pipeline run succeeded
No alerting when motion count drops unexpectedly
No detection when SVD components flip or drift
No visibility into embedding coverage (% of motions with embeddings)
LLM enrichment failures are silent (motions just lack layman_explanation)

Requirements Trace

R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
R2. Health check detects SVD stability (no sudden axis flips)
R3. Health check reports missing layman_explanations
R4. Optional: Streamlit page or API endpoint showing health metrics
R5. All health checks are testable and tested

Scope Boundaries

Included:

Health check module with individual check functions
CLI runner for health checks
Tests for each check
Optional Streamlit health page

Excluded:

Real alerting (PagerDuty, Slack) — just script exit codes for now
Long-term metrics storage (Prometheus, etc.)
Fixing the issues the health check finds

Key Technical Decisions

Pure functions for checks — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
Composable runner — A runner executes all checks and aggregates results into a report.
Exit codes — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.

Context & Research

Relevant Code and Patterns

pipeline/run_pipeline.py — orchestrates all pipeline stages
database.py — DB queries for motion counts, embeddings, vote counts
analysis/svd_labels.py — SVD component stability logic
scripts/ — existing diagnostic scripts (drift analysis, etc.)

Institutional Learnings

docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md — diagnostic scripts can produce false alarms if they don't verify against canonical DB state
docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md — metrics must be derived from canonical sources, not hardcoded

Implementation Units

U1. Create health check core module

Goal: Define the check interface and runner.

Requirements: R1–R3 foundation

Dependencies: None

Files:

Create: health/__init__.py
Create: health/core.py
Create: health/checks.py
Create: tests/test_health_core.py

Approach:

HealthStatus enum: OK, WARNING, CRITICAL
HealthCheck dataclass: name, status, message, details
run_checks(checks) → HealthReport with aggregate status
check_* functions are pure: accept data, return HealthCheck

Execution note: Test-first — write test_health_core.py with failing tests for the interface before implementing.

Test scenarios:

Happy path: All OK checks → report status OK
Error path: One CRITICAL check → report status CRITICAL
Edge case: Empty check list → report status OK
Integration: Check function signature is pure (no DB access in core)

Verification:

uv run pytest tests/test_health_core.py -v passes

U2. Implement data freshness checks

Goal: Verify the DB has recent motions and votes.

Requirements: R1

Dependencies: U1

Files:

Modify: health/checks.py
Create: tests/test_health_checks.py

Approach:

check_motion_freshness(db, max_age_days=7) — count motions newer than threshold
check_vote_coverage(db) — % of motions with votes
check_embedding_coverage(db, min_coverage=0.95) — % of motions with fused embeddings

Execution note: Test-first — use mocked DB or test fixtures with known data.

Test scenarios:

Happy path: Recent motions exist, coverage > 95% → OK
Warning path: Motions are 10 days old → WARNING
Critical path: No motions in last 30 days → CRITICAL
Edge case: Empty database → CRITICAL with clear message

Verification:

Tests pass with mocked database
Manual run against real DB produces accurate report

U3. Implement SVD stability check

Goal: Detect if SVD components have flipped or drifted significantly.

Requirements: R2

Dependencies: U1, U2

Files:

Modify: health/checks.py
Modify: tests/test_health_checks.py

Approach:

check_svd_stability(db, reference_themes) — compare current SVD_THEMES to canonical config
check_axis_flip(db) — verify right-wing parties are on the right side (reuse existing validation logic)
Use analysis/config.py SVD_THEMES as canonical reference

Execution note: Test-first — mock the DB to return known SVD components and test flip detection.

Test scenarios:

Happy path: SVD components match canonical themes → OK
Warning path: Minor label drift → WARNING
Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
Edge case: No SVD data in DB → CRITICAL

Verification:

Tests pass
Manual verification against real DB confirms no false alarms

U4. Implement LLM enrichment check

Goal: Surface motions missing layman explanations.

Requirements: R3

Dependencies: U1, U2

Files:

Modify: health/checks.py
Modify: tests/test_health_checks.py

Approach:

check_llm_coverage(db, max_missing=100) — count motions without layman_explanation
check_llm_quality(db) — spot-check a sample of explanations for non-empty, reasonable length

Test scenarios:

Happy path: <5% missing explanations → OK
Warning path: 5–15% missing → WARNING
Critical path: >15% missing → CRITICAL
Edge case: All explanations are empty strings → WARNING

Verification:

Tests pass with mocked data

U5. Create CLI runner

Goal: Run all checks from command line with appropriate exit codes.

Requirements: R1–R4

Dependencies: U1–U4

Files:

Create: scripts/health_check.py
Create: tests/scripts/test_health_check.py

Approach:

python scripts/health_check.py → prints report, exits 0/1/2
Optional flags: --check motion-freshness, --format json, --threshold-days 7

Test scenarios:

Happy path: All OK → exit 0, human-readable output
Error path: One warning → exit 1
Critical path: One critical → exit 2
Edge case: JSON format outputs valid JSON

Verification:

uv run python scripts/health_check.py runs without error
Exit codes match expectations

U6. Add Streamlit health page (optional)

Goal: Visual health dashboard in the app.

Requirements: R4

Dependencies: U1–U5

Files:

Create: pages/3_Health.py

Approach:

Run all checks on page load
Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
Use st.metric for key numbers
Color-code: green/yellow/red

Test expectation: none — Streamlit page, tested manually.

Verification:

Page loads without error
Metrics update when DB changes

System-Wide Impact

Interaction graph: Health checks read from DB but do not write. Safe to run concurrently with pipeline.
Error propagation: Check failures are captured in report, not raised as exceptions.
Unchanged invariants: No changes to pipeline, DB schema, or UI behavior.

Risks & Dependencies

Risk	Mitigation
False alarms (like trajectories diagnostic)	Verify against canonical DB state, not intermediary artifacts
Slow checks on large DB	Add query timeouts; cache results
Check drift from codebase changes	Health checks are tested; tests fail if logic breaks

Documentation / Operational Notes

Add health check to deployment runbook (run before/after pipeline)
Consider scheduling in CI or cron

Sources & References

docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md
analysis/config.py — canonical SVD themes
database.py — DB schema and queries
docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md

8.3 KiB Raw Blame History

Pipeline Health Checks and Observability

Overview

Problem Frame

Requirements Trace

Scope Boundaries

Key Technical Decisions

Context & Research

Relevant Code and Patterns

Institutional Learnings

Implementation Units

System-Wide Impact

Risks & Dependencies

Documentation / Operational Notes

Sources & References

8.3 KiB

Raw Blame History