docs: add improvement roadmap, research notes, and solution docs

- Add 2026-04-24 ROADMAP with 5 phases / 17 items - Add detailed implementation plans for P1-001 through P4-005 - Add research artifacts and solution docs from ledger merge - Add test for SVD component 1 compass alignment
3 months ago · c85a367a8e
parent ad7286ddc8
commit c85a367a8e
35 changed files with 3549 additions and 0 deletions
--- a/docs/plans/2026-04-05-001-make-modules-import-safe-plan.md
+++ b/docs/plans/2026-04-05-001-make-modules-import-safe-plan.md
@ -0,0 +1,108 @@
+Title: Make modules import-safe (duckdb/plotly)
+
+Why
+- Enable lightweight unit tests and imports in environments without heavy runtime deps (duckdb, plotly) without changing runtime behaviour when those deps are present.
+
+Scope
+- Primary focus: library modules that are commonly imported by tests or other modules (not CLI scripts that are only executed).
+- Initial rollout: small, reviewable batches. Do not push or change remote branches.
+
+Non-goals
+- Remove duckdb/plotly dependency from runtime environments.
+- Refactor functionality beyond import-time safety.
+
+Approach
+- Two safe patterns (apply conservatively):
+  1) Pattern A — module-level guard
+     ```py
+     try:
+         import duckdb
+     except Exception:  # pragma: no cover
+         duckdb = None  # type: ignore
+     ```
+     Use when multiple functions in the module call duckdb and adding the guard is least invasive.
+
+  2) Pattern B — function-local import (preferred for DB helpers)
+     Move `import duckdb` into the function that uses it and raise a clear RuntimeError when invoked without duckdb:
+     ```py
+     def open_conn(path):
+         try:
+             import duckdb
+         except Exception:
+             raise RuntimeError("duckdb is required for open_conn") from None
+         return duckdb.connect(path)
+     ```
+
+Targets (first batch — high impact)
+- `database.py` — Pattern B (move DB imports into helpers; provide clear RuntimeError when called without duckdb). Tests import `database.py` so make this robust.
+- `app.py` — Pattern B (app modules often get imported during test runs; delay duckdb until handlers that need it).
+- `pipeline/svd_pipeline.py` — Pattern A (guard top-level import; pipeline code is heavy and module-level guard is fine).
+
+Expanded target list (subsequent batches)
+- `pipeline/text_pipeline.py`, `pipeline/fusion.py` — Pattern A
+- `pipeline/extract_mp_votes.py` — Pattern B
+- `similarity/compute.py`, `summarizer.py` — Pattern A or B after inspection
+- scripts under `scripts/` only if tests import them (prefer moving to `main()`)
+
+Step-by-step rollout
+1) Prepare patches for batch 1 (three files). Create one patch file per file so changes are atomic and easy to revert.
+2) Apply edits in a feature branch or local commit. Run focused tests:
+   - `pytest tests/test_database_audit.py -q`
+   - `pytest tests/test_political_compass.py::test_* -q` (if applicable)
+3) If focused tests pass, run full suite in .venv:
+   - `.venv/bin/python -m pytest tests/ -q`
+4) If failures occur, inspect tracebacks for missing duckdb at runtime and either revert the specific change or convert Pattern A ↔ Pattern B as needed.
+5) Repeat for next batches until all targeted modules are covered.
+
+Verification
+- After each file change, run the focused tests that touch the module. After the batch, run full test suite (local `.venv` recommended):
+  - `.venv/bin/python -m pytest tests/ -q` — expect no new failures.
+- Confirm importability in empty environment (simulate by temporarily renaming `.venv` or running in environment without duckdb):
+  - `python -c "import analysis; print('ok')"` — should not raise ImportError for guarded modules.
+
+Rollback strategy
+- Make one file change per commit. If tests fail, revert the last commit and open an issue with the failure trace.
+
+Patch previews (examples)
+- Pattern A (top-level guard):
+  - replace `import duckdb` with:
+    ```py
+    try:
+        import duckdb
+    except Exception:  # pragma: no cover
+        duckdb = None  # type: ignore
+    ```
+
+- Pattern B (function-local import + clear error):
+  - before:
+    ```py
+    import duckdb
+
+    def open_conn(path):
+        return duckdb.connect(path)
+    ```
+  - after:
+    ```py
+    def open_conn(path):
+        try:
+            import duckdb
+        except Exception:
+            raise RuntimeError("duckdb is required for open_conn") from None
+        return duckdb.connect(path)
+    ```
+
+Risks & mitigations
+- Risk: hiding missing dependency until runtime. Mitigation: when using Pattern B raise descriptive RuntimeError at call site so failures are explicit.
+- Risk: tests that intentionally require duckdb may break if we change behavior incorrectly. Mitigation: run focused tests that import duckdb intentionally and keep those files unchanged.
+
+Owner & next actions for me
+- I can generate the exact patch diffs for batch 1 (three files) and present them for review before applying. This is recommended to keep the change small and reviewable.
+- Reply with:
+  - `prepare` — I will create the patch diffs for `database.py`, `app.py`, `pipeline/svd_pipeline.py` and show them to you (no files modified yet), or
+  - `apply` — I will apply the first batch now and run focused tests locally.
+
+Notes
+- I already applied import-guards in several `analysis/` modules (trajectory, explorer_data, clustering, political_axis) during earlier review; this plan continues that conservative approach.
+
+References
+- Examples: `analysis/explorer_data.py`, `analysis/trajectory.py`, `analysis/visualize.py`
--- a/docs/plans/2026-04-05-002-refactor-svd-axis-labels-plan.md
+++ b/docs/plans/2026-04-05-002-refactor-svd-axis-labels-plan.md
@ -0,0 +1,134 @@
+---
+title: "Enforce left-right orientation across all SVD axis labels"
+type: refactor
+status: active
+date: 2026-04-05
+origin: docs/superpowers/specs/2026-04-05-svd-axis-labels-design.md
+---
+
+# Enforce Left-Right Orientation Across All SVD Axis Labels
+
+## Overview
+
+Update SVD component labels in `analysis/config.py` so all 10 axes consistently reflect left-right positioning, and add validation tests to ensure canonical right-wing parties (PVV, FVD, JA21, SGP) appear on the right side after flip computation. The flip mechanism already works; this plan focuses on label consistency and test coverage.
+
+## Problem Frame
+
+SVD axis labels do not consistently reflect left-right positioning. Some axes describe dimensions like "populist vs mainstream" or "pragmatism vs ideology" without framing how right/conservative and left/progressive parties cluster on each pole. The repo convention (AGENTS.md) requires right-wing parties to appear on the RIGHT side of all axes, and labels should reflect this orientation.
+
+## Requirements Trace
+
+- R1. All 10 SVD component labels consistently frame the dimension in left-right terms
+- R2. Canonical right-wing parties (PVV, FVD, JA21, SGP) appear on the right side after flip computation
+- R3. Backward compatibility preserved for existing callers of `get_svd_label`, `get_fallback_labels`, `compute_flip_direction`
+- R4. Unit tests validate flip behavior and label correctness
+
+## Scope Boundaries
+
+- In scope: `analysis/config.py` SVD_THEMES labels, `tests/test_svd_labels.py` additions
+- Out of scope: `analysis/political_axis.py` party sets (follow-up), UI changes, flip logic changes (already works)
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `analysis/config.py` — defines `SVD_THEMES` with 10 components, each with `label`, `explanation`, `positive_pole`, `negative_pole`, `flip`
+- `analysis/svd_labels.py` — imports `CANONICAL_RIGHT`/`CANONICAL_LEFT` from config, exports aliases, `compute_flip_direction` uses them
+- `explorer.py:2680-2690` — dynamically computes flip for all 10 components at runtime, overwrites static `flip` values
+
+### Key Technical Decisions
+
+- **Keep flip mechanism as-is**: `compute_flip_direction` already uses canonical party sets to force right-wing parties to the right. No changes needed.
+- **Update labels, not flip logic**: The work is in `SVD_THEMES` label text — reframing each component's label to reflect left-right positioning while preserving the underlying voting pattern description.
+- **Preserve explanation text**: The `explanation` field can remain detailed and nuanced; only the `label` and pole descriptions need left-right framing.
+
+## Implementation Units
+
+- [ ] **Unit 1: Update SVD_THEMES labels for left-right consistency**
+
+**Goal:** Reframe all 10 SVD component labels to consistently reflect left-right positioning.
+
+**Requirements:** R1, R3
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `analysis/config.py`
+
+**Approach:**
+- For each component (1-10), update the `label` field to frame the dimension in left-right terms
+- Update `positive_pole` and `negative_pole` to explicitly mention which parties cluster on each side and their left/right positioning
+- Preserve the `explanation` text (it's already detailed and accurate)
+- Keep `flip` values as-is (they're overwritten at runtime anyway)
+
+**Patterns to follow:**
+- Component 1 label pattern: "Rechts kabinetsbeleid versus links oppositiebeleid" — this is the model
+- Component 3 label pattern: "Verzorgingsstaat versus bezuinigingen en marktwerking" — economic left-right
+- Component 6 label pattern: "Migratie en cultuur versus klimaat en progressieve inclusie" — cultural left-right (GAL-TAN)
+
+**Test scenarios:**
+- Test expectation: none — this is a label text update, no behavioral change. Verification is manual review of label text.
+
+**Verification:**
+- All 10 component labels explicitly reference left/right positioning or conservative/progressive framing
+- `positive_pole` and `negative_pole` descriptions mention party clusters and their left/right orientation
+
+- [ ] **Unit 2: Add validation test for canonical right-on-right**
+
+**Goal:** Add a test that verifies canonical right-wing parties appear on the right side after flip computation.
+
+**Requirements:** R2, R4
+
+**Dependencies:** Unit 1 (labels updated, flip logic unchanged)
+
+**Files:**
+- Modify: `tests/test_svd_labels.py`
+
+**Approach:**
+- Add `test_canonical_right_on_right` that:
+  1. Creates synthetic party scores where canonical right parties have negative values (on the left)
+  2. Asserts `compute_flip_direction` returns `True` for all components 1-10
+  3. Creates synthetic scores where canonical right parties have positive values (on the right)
+  4. Asserts `compute_flip_direction` returns `False` for all components
+- Add `test_all_canonical_parties_used` that verifies `CANONICAL_RIGHT` and `CANONICAL_LEFT` from config contain the expected parties (PVV, FVD, JA21, SGP for right; SP, PvdA, GL, etc. for left)
+
+**Execution note:** Test-first — write failing test, then verify it passes after Unit 1.
+
+**Patterns to follow:**
+- Existing test style in `tests/test_svd_labels.py` (synthetic dict-based party scores, assert on boolean flip result)
+- `test_auto_flip_computation_for_all_components` already tests flip for all 10 components — new test should follow same pattern but explicitly use `CANONICAL_RIGHT`/`CANONICAL_LEFT` from config
+
+**Test scenarios:**
+- Happy path: Canonical right parties on right side → `compute_flip_direction` returns `False` for all components
+- Happy path: Canonical right parties on left side → `compute_flip_direction` returns `True` for all components
+- Edge case: Mixed placement (some right parties on left, some on right) → flip based on majority mean
+- Edge case: No canonical parties present → returns `False` (existing behavior, verify unchanged)
+
+**Verification:**
+- `pytest tests/test_svd_labels.py -q` passes with no regressions
+- New tests explicitly validate canonical right-on-right behavior
+
+## System-Wide Impact
+
+- **Interaction graph:** `explorer.py` dynamically computes flip at runtime — no changes needed there. Labels flow from `config.py` → `svd_labels.py` → UI rendering.
+- **Unchanged invariants:** `compute_flip_direction` logic unchanged. Public API (`get_svd_label`, `get_fallback_labels`, `compute_flip_direction`) unchanged. Static `flip` values in `SVD_THEMES` still overwritten at runtime.
+- **API surface parity:** Labels change text but not structure. Callers expecting string labels continue to work.
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Label changes may not capture nuance of non-left-right axes | Preserve detailed `explanation` text; labels are shorthand, explanations carry full context |
+| Tests may pass but labels still feel off | Manual review of all 10 labels before committing |
+| `political_axis.py` still uses different party sets | Document as follow-up; out of scope for this plan |
+
+## Documentation / Operational Notes
+
+- Update or reference `docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.md` if label convention changes materially
+- No rollout or monitoring impacts — label text change only
+
+## Sources & References
+
+- **Origin document:** [docs/superpowers/specs/2026-04-05-svd-axis-labels-design.md](docs/superpowers/specs/2026-04-05-svd-axis-labels-design.md)
+- Related code: `analysis/config.py`, `analysis/svd_labels.py`, `tests/test_svd_labels.py`
+- Convention reference: `AGENTS.md` (right-wing parties must appear on RIGHT side)
--- a/docs/plans/2026-04-05-003-fix-svd-pole-labels-plan.md
+++ b/docs/plans/2026-04-05-003-fix-svd-pole-labels-plan.md
@ -0,0 +1,61 @@
+---
+title: "Add semantic left_pole/right_pole labels to SVD_THEMES"
+type: fix
+status: active
+date: 2026-04-05
+origin: docs/superpowers/specs/2026-04-05-svd-axis-labels-design.md
+---
+
+# Add Semantic Left/Right Pole Labels to SVD_THEMES
+
+## Problem
+
+The `positive_pole`/`negative_pole` labels in `SVD_THEMES` describe the raw SVD math poles, not the semantic left/right after flip. When the axis flips at runtime (to ensure right-wing parties appear on the right), the pole labels are swapped but still describe the raw SVD orientation — resulting in labels like "← PVV en FVD" appearing on the left side when they should be on the right.
+
+## Solution
+
+Add `left_pole` and `right_pole` fields to each `SVD_THEMES` entry that describe what's on the left and right sides after flip. Update rendering code to use these semantic labels directly.
+
+## Implementation Units
+
+- [ ] **Unit 1: Add left_pole/right_pole to SVD_THEMES in config.py**
+
+**Goal:** Add semantic pole labels to all 10 SVD components.
+
+**Files:**
+- Modify: `analysis/config.py`
+
+**Approach:**
+- For each component, add `left_pole` and `right_pole` fields based on the existing `positive_pole`/`negative_pole` and the `flip` value
+- When `flip=True`: `left_pole` = `positive_pole`, `right_pole` = `negative_pole`
+- When `flip=False`: `left_pole` = `negative_pole`, `right_pole` = `positive_pole`
+- Keep `positive_pole`/`negative_pole` for backward compatibility
+
+- [ ] **Unit 2: Update explorer.py rendering to use left_pole/right_pole**
+
+**Goal:** Use semantic pole labels in all rendering functions.
+
+**Files:**
+- Modify: `explorer.py` (lines 967-970, 1087-1090, 1252-1253, 2806-2807)
+
+**Approach:**
+- Replace the positive/negative swap logic with direct `left_pole`/`right_pole` usage
+- `left_label = theme.get("left_pole", pos_pole if flip else neg_pole)` (backward compat fallback)
+- `right_label = theme.get("right_pole", neg_pole if flip else pos_pole)`
+
+- [ ] **Unit 3: Update tests**
+
+**Goal:** Add tests for left_pole/right_pole fields.
+
+**Files:**
+- Modify: `tests/test_svd_labels.py`
+- Modify: `tests/test_explorer_chart.py`
+
+**Approach:**
+- Test that all 10 SVD_THEMES entries have `left_pole` and `right_pole` fields
+- Test that rendering functions use left_pole/right_pole correctly
+
+## Scope Boundaries
+
+- In scope: `analysis/config.py` SVD_THEMES, `explorer.py` rendering, tests
+- Out of scope: `analysis/political_axis.py`, `analysis/projections.py` (uses positive_pole/negative_pole for motion projection, not UI labels)
--- a/docs/plans/2026-04-05-004-feat-motion-semantic-drift-plan.md
+++ b/docs/plans/2026-04-05-004-feat-motion-semantic-drift-plan.md
@ -0,0 +1,347 @@
+---
+title: "Motion semantic drift analysis over time"
+type: feat
+status: active
+date: 2026-04-05
+origin: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md
+---
+
+# Motion Semantic Drift Analysis Over Time
+
+## Overview
+
+Add a new analysis script that tracks how the semantic content of motions on each SVD axis evolves across annual windows (2016-2024). The script produces a markdown report with charts showing axis stability, semantic drift timelines, party voting trajectories, and cross-ideological voting patterns. This is Phase 1 (script + report); a future phase will integrate this into the Streamlit explorer.
+
+## Problem Frame
+
+The SVD explorer shows where parties and motions sit on axes at a point in time, but doesn't reveal how the semantic content evolves. Users can't answer: did "right-wing" motions become more extreme over time? Are the SVD axes themselves stable across windows? Do left-wing parties increasingly vote for right-wing motions? (see origin: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md)
+
+## Requirements Trace
+
+- R1. Compute cosine similarity between SVD component vectors (or motion projection patterns) across all annual windows
+- R2. Generate a stability heatmap showing which axes are comparable across time
+- R3. Detect axis reordering across windows
+- R4. Flag unstable axes
+- R5. For each stable axis, compute average fused embedding centroid of top N motions per window
+- R6. Track semantic drift using cosine distance between consecutive window centroids
+- R7. Identify inflection points where drift accelerated (threshold-based)
+- R8. Show example motions before/after inflection points
+- R9. For each party, compute voting centroid per window along each stable axis
+- R10. Track party trajectories over time
+- R11. Detect cross-ideological voting patterns
+- R12. Show concrete examples of parties voting against ideological alignment
+- R13. Script produces markdown report with embedded charts
+- R14. Report includes: stability heatmap, drift timelines, party trajectories, inflection analysis
+- R15. Script is parameterized: `--db`, `--windows`, `--top-n`, `--output`
+
+## Scope Boundaries
+
+- Annual windows only (2016-2024); quarterly windows too sparse
+- Script + report only — no UI/explorer integration in this phase
+- No statistical significance testing beyond basic change-point detection
+- SVD component vectors (V^T matrix) not currently stored — must be added to pipeline or computed indirectly
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `scripts/generate_svd_json.py` — script structure pattern: `main(argv) -> int`, argparse, ROOT path setup, logger
+- `scripts/svd_diagnostics.py` — generates markdown + JSON report from SVD analysis
+- `analysis/explorer_data.py` — DuckDB data loading patterns (read_only, try/finally, vector parsing), `load_mp_vectors_by_party_for_window()` for date-aware party normalization
+- `analysis/trajectory.py` — existing cross-window drift computation using `_procrustes_align_windows()`
+- `pipeline/svd_pipeline.py` — SVD computation; V^T available as `Vt` variable before scaling
+- `tests/test_analysis.py` — test patterns: `tmp_path` fixture, `_setup_svd_vectors()` helper, class-based tests
+- `analysis/config.py` — `CANONICAL_RIGHT`/`CANONICAL_LEFT` for cross-ideological voting detection
+
+### Key Technical Decisions
+
+- **matplotlib for static charts** — no matplotlib usage exists in codebase; this introduces a new dependency. Alternative: Plotly static image export (already in stack). Decision: use matplotlib for markdown-embedded PNGs; simpler for static reports.
+- **V^T storage via dedicated entity_type** — store raw V^T matrix as `entity_type='vt_matrix'` row in `svd_vectors`. Historical windows won't have V^T; motion-ranking correlation fallback is the primary approach for this phase.
+- **Axis stability via motion projection patterns with Procrustes alignment** — since V^T may not be available for historical windows, compute axis stability indirectly. First apply Procrustes alignment (reuse `_procrustes_align_windows()` from `analysis/trajectory.py`) to motion vectors across windows, then correlate top-N motion rankings per component. This handles SVD sign ambiguity and rotation.
+- **Threshold-based change-point detection** — simple drift rate threshold (no new dependencies). Detect when consecutive drift exceeds 2× median drift rate.
+- **Stability threshold** — cosine similarity > 0.7 classifies axes as stable. Default parameterized via `--stability-threshold` with 0.7 as default. Distribution of similarity values reported in output for sensitivity assessment.
+- **Cross-ideological voting** — use `CANONICAL_RIGHT` from `analysis.config` to identify right-wing motions (high positive loading on axis 1), then detect left-wing parties voting "voor" on those motions. Axis polarity determined per-window using canonical party scores, not global constants.
+
+## Open Questions
+
+### Resolved During Planning
+
+- **Charting library**: matplotlib for static PNG embedding in markdown. Add to `pyproject.toml`.
+- **Change-point detection**: Simple threshold on drift rate (2× median). No new dependencies.
+- **Party-motion linkage**: Use `mp_votes` table — party voted "voor" on motion. This measures voting alignment, not sponsorship.
+- **Axis stability approach**: Two-tier — (a) if V^T available, use cosine similarity; (b) fallback: Procrustes-align motion vectors, then correlate top-N motion rankings per component across windows.
+- **Top N for centroids**: Default N=20, parameterized via `--top-n`. Test during execution.
+
+### Deferred to Implementation
+
+- Exact optimal N for top motions per axis — will test N=10, 20, 50 during execution and pick the one with clearest signal
+- Cross-ideological voting threshold — provisional: party voting "voor" on motions where canonical opposite-wing parties have high absolute loadings; will calibrate against baseline
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    scripts/motion_drift.py                       │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  1. Load Data                                                    │
+│     ├── fused_embeddings (per window, per motion)                │
+│     ├── svd_vectors (motion projections per window)              │
+│     ├── mp_votes (party voting records)                          │
+│     └── motions (text for examples)                              │
+│                                                                  │
+│  2. Axis Stability                                               │
+│     ├── Procrustes-align motion vectors across windows           │
+│     ├── Option A: cosine similarity of V^T vectors (if stored)   │
+│     └── Option B: correlate top-N motion rankings per component  │
+│     └── Output: stability heatmap (window × component matrix)    │
+│                                                                  │
+│  3. Semantic Drift                                               │
+│     ├── For each stable axis:                                     │
+│     │   ├── Get top N motions by |loading| per window            │
+│     │   ├── Compute fused embedding centroid per window          │
+│     │   └── Cosine distance between consecutive windows          │
+│     └── Output: drift timeline per axis + inflection points      │
+│                                                                  │
+│  4. Party Voting Analysis                                        │
+│     ├── For each party (with date-aware name normalization):     │
+│     │   ├── Get motions party voted "voor" on per window         │
+│     │   └── Compute voting centroid along each stable axis       │
+│     ├── Cross-ideological detection (per-window axis polarity):  │
+│     │   ├── Left parties voting "voor" on right-wing motions     │
+│     │   └── Right parties voting "voor" on left-wing motions     │
+│     └── Output: party trajectory plots + cross-voting examples   │
+│                                                                  │
+│  5. Report Generation                                            │
+│     ├── Markdown with embedded matplotlib PNGs                   │
+│     ├── Axis stability heatmap                                   │
+│     ├── Semantic drift timelines                                 │
+│     ├── Party trajectory plots                                   │
+│     └── Inflection point analysis with motion examples           │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Implementation Units
+
+- [ ] **Unit 1: Add matplotlib dependency and script scaffolding**
+
+**Goal:** Set up the new script with proper structure and dependencies.
+
+**Requirements:** R15
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `pyproject.toml` (add matplotlib)
+- Create: `scripts/motion_drift.py`
+- Test: `tests/test_motion_drift.py`
+
+**Approach:**
+- Add `matplotlib>=3.8` to `pyproject.toml` dependencies
+- Create `scripts/motion_drift.py` following established script pattern: `main(argv) -> int`, argparse with `--db`, `--windows`, `--top-n`, `--output`, ROOT path setup, module logger
+- Add schema validation at startup: check for required tables (`svd_vectors`, `fused_embeddings`, `mp_votes`, `motions`)
+- Create minimal `tests/test_motion_drift.py` with import test, argument parsing test, and schema validation test using in-memory DuckDB fixture
+
+**Patterns to follow:**
+- `scripts/generate_svd_json.py` — script structure, argparse, entry point
+- `scripts/svd_diagnostics.py` — report generation pattern
+- `tests/test_analysis.py` — `tmp_path` fixture, `_setup_svd_vectors()` helper
+
+**Test scenarios:**
+- Happy path: `main(["--help"])` exits with code 0 and prints usage
+- Happy path: `main(["--db", "data/motions.db", "--output", "/tmp/test"])` runs without error
+- Edge case: `main(["--db", "nonexistent.db"])` handles missing database gracefully (exit code 1)
+- Edge case: database with missing tables produces clear error message
+
+**Verification:**
+- `uv run python scripts/motion_drift.py --help` shows all arguments
+- `uv run python -m pytest tests/test_motion_drift.py -q` passes
+
+- [ ] **Unit 2: Axis stability analysis**
+
+**Goal:** Compute axis stability across annual windows and generate stability heatmap.
+
+**Requirements:** R1, R2, R3, R4
+
+**Dependencies:** Unit 1
+
+**Files:**
+- Create: `analysis/motion_drift.py` (core analysis module)
+- Modify: `scripts/motion_drift.py` (call axis stability)
+- Test: `tests/test_motion_drift.py`
+
+**Approach:**
+- Create `analysis/motion_drift.py` with `compute_axis_stability(db_path, windows)` function
+- Two-tier approach:
+  - Try loading V^T from `svd_vectors` where `entity_type='vt_matrix'` (if stored by pipeline)
+  - Fallback: apply Procrustes alignment to motion vectors across windows (reuse `_procrustes_align_windows()` from `analysis/trajectory.py`), then for each window get top N motions per component by absolute score and compute pairwise cosine similarity of motion ranking vectors
+- Generate stability heatmap as matplotlib figure (window × component matrix, color-coded by similarity)
+- Return stability report: which axes are stable (similarity > 0.7), which are reordered (high similarity to different component index), which are unstable (low similarity to any component)
+
+**Patterns to follow:**
+- `analysis/explorer_data.py` — DuckDB loading patterns, vector parsing
+- `analysis/trajectory.py` — `_procrustes_align_windows()` for cross-window comparison
+
+**Test scenarios:**
+- Happy path: `compute_axis_stability` returns stability matrix for 3+ windows with synthetic data
+- Happy path: stability matrix is symmetric and values are in [-1, 1]
+- Happy path: Procrustes alignment corrects sign flips between windows
+- Edge case: single window returns empty stability report (no comparison possible)
+- Edge case: windows with no motion vectors handled gracefully (warning logged, skipped)
+- Integration: run against real `data/motions.db` annual windows, verify heatmap is generated
+
+**Verification:**
+- Stability heatmap PNG generated with correct dimensions (windows × components)
+- Stability report identifies at least some axes as stable (similarity > 0.7)
+
+- [ ] **Unit 3: Semantic drift analysis**
+
+**Goal:** Compute semantic drift timelines for stable axes and detect inflection points.
+
+**Requirements:** R5, R6, R7, R8
+
+**Dependencies:** Unit 2 (needs stable axis list)
+
+**Files:**
+- Modify: `analysis/motion_drift.py` (add drift functions)
+- Modify: `scripts/motion_drift.py` (call drift analysis)
+- Test: `tests/test_motion_drift.py`
+
+**Approach:**
+- Add `compute_semantic_drift(db_path, stable_axes, windows, top_n)` function
+- For each stable axis:
+  - Get top N motions per window by absolute SVD loading
+  - Compute average fused embedding centroid per window
+  - Compute cosine distance between consecutive window centroids
+  - Detect inflection points: where drift rate exceeds 2× median drift rate
+- For each inflection point, extract example motions (top 3 before/after by loading)
+- Generate drift timeline plot per axis (line chart with inflection point markers)
+
+**Patterns to follow:**
+- `analysis/trajectory.py` — `compute_trajectories()` for cross-window drift computation
+- `scripts/svd_diagnostics.py` — markdown report generation
+
+**Test scenarios:**
+- Happy path: `compute_semantic_drift` returns drift series for each stable axis
+- Happy path: drift values are in [0, 2] (cosine distance range)
+- Happy path: inflection points detected when synthetic data has abrupt change
+- Edge case: axis with only 2 windows returns drift but no inflection points
+- Edge case: axis with monotonic drift returns no inflection points
+- Integration: run against real data, verify drift timelines are plausible
+
+**Verification:**
+- Drift timeline PNG generated per stable axis
+- Inflection points (if any) are marked on timeline with motion examples in report
+
+- [ ] **Unit 4: Party voting analysis**
+
+**Goal:** Compute party voting centroids and detect cross-ideological voting patterns.
+
+**Requirements:** R9, R10, R11, R12
+
+**Dependencies:** Unit 2 (needs stable axis list)
+
+**Files:**
+- Modify: `analysis/motion_drift.py` (add party analysis functions)
+- Modify: `scripts/motion_drift.py` (call party analysis)
+- Test: `tests/test_motion_drift.py`
+
+**Approach:**
+- Add `compute_party_voting(db_path, stable_axes, windows)` function
+- For each party:
+  - Query `mp_votes` for motions party voted "voor" on per window, using date-aware party name normalization (reuse `load_mp_vectors_by_party_for_window()` pattern from `analysis/explorer_data.py`)
+  - For each motion, get its SVD scores from `svd_vectors`
+  - Compute unweighted mean score along each stable axis (voting centroid)
+- Track party trajectories: plot party centroid position per window along each axis
+- Detect cross-ideological voting:
+  - For each window, independently determine axis polarity by checking where canonical right-wing parties (CANONICAL_RIGHT) score on each axis
+  - Identify "right-wing" motions (high positive loading on axis where PVV/FVD/JA21/SGP score high after polarity check)
+  - Find left-wing parties (SP, PvdA, GL, etc.) voting "voor" on right-wing motions
+  - Compute cross-voting rate per party per window
+  - Detect trends: is cross-voting increasing or decreasing over time?
+- Generate party trajectory plots and cross-voting summary table
+
+**Patterns to follow:**
+- `analysis/config.py` — `CANONICAL_RIGHT`/`CANONICAL_LEFT` for party classification
+- `analysis/explorer_data.py` — `mp_votes` query patterns, `load_mp_vectors_by_party_for_window()` for party normalization
+
+**Test scenarios:**
+- Happy path: `compute_party_voting` returns voting centroids for parties with sufficient data
+- Happy path: cross-ideological voting detected when synthetic data has left party voting on right motions
+- Happy path: party name normalization maps historical names (GL, PvdA → GroenLinks-PvdA) correctly
+- Edge case: party with no "voor" votes in a window handled gracefully (centroid = NaN, skipped)
+- Edge case: window with no voting data handled gracefully
+- Integration: run against real data, verify party trajectories are plausible
+
+**Verification:**
+- Party trajectory PNG generated showing party movement across windows
+- Cross-voting summary table in report with at least one example
+
+- [ ] **Unit 5: Report generation**
+
+**Goal:** Assemble all analysis outputs into a markdown report with embedded charts.
+
+**Requirements:** R13, R14, R15
+
+**Dependencies:** Units 2, 3, 4
+
+**Files:**
+- Modify: `scripts/motion_drift.py` (orchestrate report generation)
+- Test: `tests/test_motion_drift.py`
+
+**Approach:**
+- Add `_generate_report(output_dir, stability_result, drift_result, party_result)` function
+- Generate markdown with sections:
+  - Summary (key findings, number of stable axes, inflection points, cross-voting trends)
+  - Axis Stability (heatmap + interpretation)
+  - Semantic Drift (timeline per axis + inflection point analysis with motion examples)
+  - Party Voting Analysis (trajectory plots + cross-voting summary + examples)
+  - Methodology (brief description of approach, parameters used)
+- Save all matplotlib figures as PNGs in output directory
+- Embed PNGs in markdown using relative paths
+
+**Patterns to follow:**
+- `scripts/svd_diagnostics.py` — markdown report structure
+- `scripts/generate_svd_json.py` — `_generate_markdown_report()` function
+
+**Test scenarios:**
+- Happy path: report generated with all sections and embedded images
+- Happy path: all PNG files exist in output directory
+- Edge case: no stable axes → report notes this and skips drift/party sections
+- Edge case: output directory creation when it doesn't exist
+
+**Verification:**
+- `output/report.md` exists and contains all expected sections
+- All referenced PNG files exist in output directory
+- Report is readable in a markdown viewer
+
+## System-Wide Impact
+
+- **Interaction graph:** New script reads from existing DuckDB tables; no writes to production data. Pipeline change needed to store V^T matrix (optional, for future windows).
+- **Unchanged invariants:** SVD computation unchanged. Explorer unchanged. Existing analysis modules unchanged.
+- **New dependency:** `matplotlib` added to `pyproject.toml`. First use of matplotlib in codebase.
+
+## Risks & Dependencies
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| matplotlib introduces new dependency burden | Low | Low | Already common library; well-maintained. Alternative: use Plotly static export if team prefers single viz stack. |
+| V^T matrix not available for historical windows | High | Medium | Fallback to Procrustes-aligned motion ranking correlation (works with existing data). Store V^T going forward. |
+| Sparse data in early windows (2016-2018: 124-162 motions) | Medium | Medium | Script warns about low-coverage windows; analysis focuses on 2019+ where data is richer. |
+| Cross-ideological voting detection threshold too sensitive/insensitive | Medium | Low | Threshold is parameterized; will calibrate during execution against baseline drift rates. |
+| Script exceeds 2-minute runtime on full dataset | Low | Low | JSON parsing of fused embeddings is the bottleneck. Will batch-load and cache if needed. |
+
+## Documentation / Operational Notes
+
+- New script: `scripts/motion_drift.py` — usage documented in module docstring
+- New analysis module: `analysis/motion_drift.py` — functions documented with docstrings
+- Report output: markdown with embedded PNGs, shareable without running the script
+- Future: integrate analysis into Streamlit explorer tab (separate plan)
+
+## Sources & References
+
+- **Origin document:** [docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md](docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md)
+- Related code: `scripts/generate_svd_json.py`, `scripts/svd_diagnostics.py`, `analysis/trajectory.py`, `analysis/explorer_data.py`
+- Party sets: `analysis/config.py` (CANONICAL_RIGHT, CANONICAL_LEFT)
--- a/docs/plans/2026-04-05-005-refactor-axis-stability-regression-plan.md
+++ b/docs/plans/2026-04-05-005-refactor-axis-stability-regression-plan.md
@ -0,0 +1,241 @@
+---
+title: "Refine axis stability with regression weights and overtone shift"
+type: refactor
+status: active
+date: 2026-04-05
+origin: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md
+---
+
+# Refine Axis Stability with Regression Weights and Overtone Shift
+
+## Overview
+
+Replace the current axis stability computation (party-based sign consistency) with a regression-based approach that measures whether the *semantic features* defining each SVD axis remain stable across windows. Add overtone shift analysis to detect when motion content changes even if party ordering stays the same.
+
+## Problem Frame
+
+The current stability metric only checks whether left/right parties score on the expected side of each axis. This misses two important questions:
+1. **Axis stability**: Does axis 1 capture the same underlying theme in 2019 and 2024? (e.g., "social vs individual" should be stable even if specific motions change)
+2. **Overtone shift**: Are motions on axis 1 becoming more about migration and less about economics over time, even if PVV still scores higher than SP?
+
+The current approach found zero stable axes because it measured party sign consistency, not semantic stability.
+
+## Requirements Trace
+
+- R1. Compute semantic stability via Ridge regression weights across windows (replaces party sign consistency)
+- R2. Generate stability heatmap showing which axes are semantically comparable across time
+- R3. Detect axis reordering — cases where axis N in window A ≈ axis M in window B
+- R4. Flag unstable axes where semantic signature changes significantly
+- R5. For each stable axis, compute semantic gravity (weighted mean fused embedding) per window
+- R6. Track overtone shift: how semantic gravity moves across windows
+- R7. Identify inflection points where overtone shift accelerated
+- R8. Show example motions and top shifting dimensions at inflection points
+- R9-R12. Party voting analysis (unchanged from existing implementation)
+- R13-R15. Output and parameterization (unchanged)
+
+## Scope Boundaries
+
+- Refine existing `scripts/motion_drift.py` — no new script
+- Keep party voting analysis and report generation (already working)
+- Annual windows only; quarterly too sparse
+- Ridge regression with scikit-learn (already in dependencies)
+
+## Context & Research
+
+### Existing Code
+
+- `scripts/motion_drift.py` — current implementation with party-based fallback stability
+- `analysis/clustering.py` — UMAP + KMeans infrastructure (not directly used but shows pattern)
+- `scikit-learn>=1.8.0` — already in `pyproject.toml`, provides `Ridge`
+
+### Key Technical Decisions
+
+- **Ridge regression per axis per window**: Fit `SVD_score ~ fused_embedding` for each axis. The weight vector (2610 dims) is the semantic signature. Compare via cosine similarity across windows.
+- **Semantic gravity for overtone shift**: Weighted mean fused embedding of all motions, weighted by absolute SVD score on the axis. Track how gravity moves across windows.
+- **Top-K dimensions for interpretation**: Extract top-50 dimensions by absolute regression weight. Project gravity onto these to identify which semantic features are shifting.
+- **Party-based fallback kept**: For windows with too few motions for regression (< 50), fall back to party sign consistency.
+
+## Open Questions
+
+### Resolved During Planning
+
+- **Regression type**: Ridge (L2 regularization) — handles 2610-dim vectors without overfitting, already available via scikit-learn.
+- **Alpha (regularization strength)**: Default 1.0, parameterized via `--regression-alpha`. Will test 0.1, 1.0, 10.0 during execution.
+- **Top-K dimensions for interpretation**: K=50 — enough to capture semantic signal without noise.
+- **Overtone shift metric**: Cosine distance between semantic gravity points across consecutive windows. Threshold for inflection: 2× median shift rate.
+
+### Deferred to Implementation
+
+- Optimal alpha for Ridge regression — will test against real data and pick value that gives most interpretable weight vectors
+- Whether to normalize fused embeddings before regression (likely yes, since SVD dims are ~1-100 scale and text dims are ~0-1)
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│              Refined Axis Stability + Overtone Shift             │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  1. Per-Axis Ridge Regression (per window)                       │
+│     ├── For each SVD axis k:                                     │
+│     │   X = fused_embeddings (n_motions × 2610)                 │
+│     │   y = SVD scores on axis k (n_motions)                    │
+│     │   w_k = Ridge.fit(X, y).coef_  (2610-dim weight vector)   │
+│     └── Output: weight_vectors[window][axis]                     │
+│                                                                  │
+│  2. Stability Matrix                                              │
+│     ├── For each axis k, compute cosine similarity of w_k        │
+│     │   across all window pairs                                  │
+│     └── Output: stability_matrix[window][window][axis]           │
+│                                                                  │
+│  3. Overtone Shift                                                │
+│     ├── For each axis k and window:                              │
+│     │   gravity_k = weighted_mean(fused_embeddings,              │
+│     │              weights=abs(SVD_scores_k))                    │
+│     │   shift_k = cosine_distance(gravity_k[t], gravity_k[t+1])  │
+│     └── Output: shift_series[axis] = [shift values per window]   │
+│                                                                  │
+│  4. Interpretation                                                │
+│     ├── Top-50 dimensions per axis (by |weight|)                 │
+│     ├── Project gravity onto top dimensions to see shifts        │
+│     └── Report: "Axis 1 stable (0.82), overtone shift (0.45)    │
+│              — migration framing gained +0.31, economic -0.22"   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Implementation Units
+
+- [ ] **Unit 1: Add Ridge regression-based stability computation**
+
+**Goal:** Replace `compute_axis_stability()` with regression-based version.
+
+**Requirements:** R1, R2, R3, R4
+
+**Dependencies:** None (replaces existing function)
+
+**Files:**
+- Modify: `scripts/motion_drift.py` (replace `compute_axis_stability`)
+- Modify: `tests/test_motion_drift.py` (update stability tests)
+
+**Approach:**
+- New `compute_axis_stability()` function:
+  - For each window, load motion scores + fused embeddings
+  - For each axis k (1-10), fit Ridge regression: `score_k ~ fused_embedding`
+  - Normalize features before fitting (StandardScaler on fused embeddings)
+  - Extract weight vector w_k (2610 dims)
+  - Compute pairwise cosine similarity of w_k across windows
+  - Return stability matrix, stable/reordered/unstable axes
+- Keep `_compute_stability_fallback()` for windows with < 50 motions
+- Add `--regression-alpha` CLI argument (default 1.0)
+
+**Patterns to follow:**
+- `sklearn.linear_model.Ridge` — standard usage: `Ridge(alpha=alpha).fit(X, y)`
+- `sklearn.preprocessing.StandardScaler` — normalize features before regression
+
+**Test scenarios:**
+- Happy path: regression produces weight vectors with cosine similarity in [-1, 1]
+- Happy path: synthetic data with known semantic signatures recovers stable axes
+- Edge case: window with < 50 motions falls back to party-based method
+- Edge case: all motions have same score on axis (degenerate case)
+- Integration: run against real data, verify stability values are non-zero
+
+**Verification:**
+- Stability matrix has correct shape (n_windows × n_windows × n_components)
+- At least some axes show stability > 0.5 on real data
+- Fallback triggers correctly for sparse windows
+
+- [ ] **Unit 2: Add overtone shift analysis**
+
+**Goal:** Compute semantic gravity trajectories and detect overtone shifts.
+
+**Requirements:** R5, R6, R7, R8
+
+**Dependencies:** Unit 1 (needs regression weight vectors for top-K dimension interpretation; shift computation itself is independent)
+
+**Files:**
+- Create: `compute_overtone_shift()` function in `scripts/motion_drift.py`
+- Modify: `scripts/motion_drift.py` (call overtone shift in main)
+- Modify: `tests/test_motion_drift.py` (add overtone shift tests)
+
+**Approach:**
+- New `compute_overtone_shift(db_path, stable_axes, windows, top_k=50)` function:
+  - For each stable axis and window:
+    - Load motion scores and fused embeddings
+    - Compute semantic gravity: weighted mean of fused embeddings, weights = abs(SVD scores)
+    - Extract top-K dimensions by absolute regression weight
+    - Project gravity onto top-K dimensions
+  - Compute cosine distance between consecutive window gravity points
+  - Detect inflection points: shift > 2× median shift rate
+  - For each inflection, identify top shifting dimensions and example motions
+- Return shift series, inflection points, dimension-level analysis
+
+**Test scenarios:**
+- Happy path: overtone shift returns shift series for each stable axis
+- Happy path: synthetic data with known shift detects inflection point
+- Edge case: axis with only 2 windows returns shift but no inflection points
+- Edge case: monotonic shift returns no inflection points
+- Integration: run against real data, verify shift values are plausible
+
+**Verification:**
+- Shift series has correct length (n_windows - 1 per axis)
+- Inflection points (if any) include dimension-level analysis
+- Top shifting dimensions are reported with direction and magnitude
+
+- [ ] **Unit 3: Update report generation with new metrics**
+
+**Goal:** Update report to show both stability and overtone shift per axis.
+
+**Requirements:** R13, R14
+
+**Dependencies:** Units 1, 2
+
+**Files:**
+- Modify: `scripts/motion_drift.py` (`_generate_report` function)
+- Modify: `tests/test_motion_drift.py` (update report tests)
+
+**Approach:**
+- Update `_generate_report()` to include:
+  - Stability heatmap (regression weight similarity)
+  - Overtone shift timeline per axis (line chart with inflection markers)
+  - For each stable axis: stability score + overtone shift magnitude
+  - Top shifting dimensions table: dimension index, direction, magnitude
+  - Example motions at inflection points
+- Keep existing party voting analysis section unchanged
+
+**Test scenarios:**
+- Happy path: report includes both stability and overtone shift sections
+- Happy path: all charts generated and embedded
+- Edge case: no stable axes → report notes this, skips overtone shift
+
+**Verification:**
+- Report contains stability heatmap, shift timelines, and dimension analysis
+- All PNG files exist in output directory
+
+## System-Wide Impact
+
+- **Interaction graph:** Replaces `compute_axis_stability()` — callers (main function) unchanged API
+- **Unchanged invariants:** Party voting analysis, report structure, CLI interface
+- **New dependency:** None — scikit-learn already in dependencies
+
+## Risks & Dependencies
+
+| Risk | Likelihood | Impact | Mitigation |
+|------|-----------|--------|------------|
+| Ridge regression overfits with 2610 features | Medium | Medium | Use Ridge (L2 regularization), test multiple alpha values, validate with cross-validation |
+| Fused embeddings have different dimensions across windows | Low | Low | Already handled — truncate to min dimension |
+| Regression takes too long on full dataset | Medium | Low | 9 windows × 10 axes = 90 Ridge fits. Each fit on ~3000×2610 matrix ~0.1s with sklearn. Total ~9s. Acceptable. |
+| Weight vectors are hard to interpret | Medium | Low | Focus on top-50 dimensions, report direction and magnitude clearly |
+
+## Documentation / Operational Notes
+
+- Updated script: `scripts/motion_drift.py` — new stability metric, new overtone shift analysis
+- Report output: markdown with stability heatmap, shift timelines, dimension analysis
+- Existing report sections (party voting) unchanged
+
+## Sources & References
+
+- **Origin document:** [docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md](docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md)
+- Related code: `scripts/motion_drift.py` (existing implementation), `analysis/clustering.py` (UMAP/KMeans patterns)
+- Ridge regression: `sklearn.linear_model.Ridge`
--- a/docs/plans/2026-04-24-001-fix-ci-test-workflow-plan.md
+++ b/docs/plans/2026-04-24-001-fix-ci-test-workflow-plan.md
@ -0,0 +1,127 @@
+---
+title: "fix: CI test workflow references missing requirements.txt"
+type: fix
+status: active
+date: 2026-04-24
+---
+
+# Fix: CI Test Workflow
+
+## Overview
+
+The scheduled CI workflow `.github/workflows/mindmodel-schedule.yml` attempts `pip install -r requirements.txt`, but this file does not exist. The project uses `uv` with `pyproject.toml` and `uv.lock`. This workflow fails silently (`|| true`) and never actually runs tests meaningfully.
+
+## Problem Frame
+
+- Python 3.11 is hardcoded in the workflow; the project requires >=3.13
+- `requirements.txt` is missing; dependencies are in `pyproject.toml`
+- No pytest gate on push/PR — regressions are only caught locally
+- The mindmodel validator runs regardless, masking the test failure
+
+## Requirements Trace
+
+- R1. CI must install dependencies correctly using the project's package manager
+- R2. CI must run pytest on push and PR to main
+- R3. CI must use Python >=3.13 matching pyproject.toml
+- R4. CI must fail visibly when tests fail (no `|| true` masking)
+
+## Scope Boundaries
+
+**Included:**
+- Fix existing mindmodel-schedule.yml
+- Add new pytest workflow for push/PR
+
+**Excluded:**
+- Changing test code or test dependencies
+- Adding new tests
+- Changing the mindmodel validator logic
+
+## Key Technical Decisions
+
+- **Use `uv` in CI** — matches local development and pyproject.toml. Use `astral-sh/setup-uv` action.
+- **Separate workflows** — keep mindmodel schedule weekly, add pytest on push/PR
+- **Fail fast** — remove `|| true` from pytest step
+
+## Implementation Units
+
+- [ ] U1. **Fix mindmodel-schedule.yml to use uv**
+
+**Goal:** Make the scheduled workflow install deps and run tests correctly.
+
+**Requirements:** R1, R3, R4
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `.github/workflows/mindmodel-schedule.yml`
+
+**Approach:**
+- Replace `actions/setup-python@v4` + `pip install` with `astral-sh/setup-uv@v5`
+- Use `uv sync` to install from `pyproject.toml`/`uv.lock`
+- Change Python version to 3.13
+- Remove `|| true` from pytest step
+- Keep mindmodel validator as-is
+
+**Execution note:** Test-first — write a workflow validation test that checks the YAML parses correctly and references valid files.
+
+**Test scenarios:**
+- Happy path: Workflow YAML is valid GitHub Actions syntax
+- Error path: pytest step fails if tests fail (no `|| true`)
+- Integration: `uv sync` installs the same lockfile as local dev
+
+**Verification:**
+- `python -c "import yaml; yaml.safe_load(open('.github/workflows/mindmodel-schedule.yml'))"` passes
+- Workflow runs successfully on next schedule trigger
+
+---
+
+- [ ] U2. **Add pytest workflow for push/PR**
+
+**Goal:** Run tests on every push and PR to main.
+
+**Requirements:** R2, R3, R4
+
+**Dependencies:** None
+
+**Files:**
+- Create: `.github/workflows/pytest.yml`
+
+**Approach:**
+- Trigger on `push` to `main` and `pull_request` to `main`
+- Use `astral-sh/setup-uv@v5` with Python 3.13
+- Run `uv run pytest tests/ -q`
+- Cache uv dependencies between runs
+
+**Execution note:** Test-first — write a test that verifies the new workflow file exists and has required fields.
+
+**Test scenarios:**
+- Happy path: Workflow triggers on push to main
+- Happy path: Workflow triggers on PR to main
+- Error path: pytest fails → workflow fails
+- Edge case: Caching speeds up repeated runs
+
+**Verification:**
+- New workflow appears in repo Actions tab
+- Pushing this plan branch triggers the workflow
+- All tests pass in CI
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| uv action not available or fails | Pin to known good version; test on fork first |
+| Tests fail in CI but pass locally | Likely env difference; debug in CI logs |
+| Gitea runner differences | Use standard ubuntu-latest; no Gitea-specific actions |
+
+## Documentation / Operational Notes
+
+- Update ARCHITECTURE.md CI section if it mentions the old workflow
+- Note in AGENTS.md that CI runs on GitHub Actions (not Gitea CI)
+
+## Sources & References
+
+- Existing workflow: `.github/workflows/mindmodel-schedule.yml`
+- Package manager: `pyproject.toml`, `uv.lock`
+- uv GitHub Action: https://github.com/astral-sh/setup-uv
--- a/docs/plans/2026-04-24-002-fix-docker-compose-scheduler-plan.md
+++ b/docs/plans/2026-04-24-002-fix-docker-compose-scheduler-plan.md
@ -0,0 +1,92 @@
+---
+title: "fix: docker-compose.yml references missing scheduler.py"
+type: fix
+status: active
+date: 2026-04-24
+---
+
+# Fix: docker-compose.yml Missing scheduler.py
+
+## Overview
+
+`docker-compose.yml` defines a `scheduler` service that runs `python scheduler.py`, but no `scheduler.py` exists in the repo. This causes `docker-compose up` to fail for the scheduler service.
+
+## Problem Frame
+
+- The scheduler service is a deployment bug
+- Either the file was never created, or it was removed and the compose file was not updated
+- The `schedule` dependency in pyproject.toml suggests automated scheduling was intended
+
+## Requirements Trace
+
+- R1. docker-compose.yml must reference only existing files
+- R2. If scheduling is desired, create scheduler.py; if not, remove the service
+
+## Scope Boundaries
+
+**Included:**
+- Fix docker-compose.yml
+
+**Excluded:**
+- Implementing actual scheduling logic (unless user confirms)
+
+## Implementation Units
+
+- [ ] U1. **Determine scheduler intent**
+
+**Goal:** Decide whether to create scheduler.py or remove the service.
+
+**Requirements:** R1, R2
+
+**Dependencies:** None
+
+**Files:**
+- Read: `docker-compose.yml`, `pyproject.toml`
+
+**Approach:**
+- Check if `schedule` dependency is used anywhere
+- Check if there's a scheduling script under another name
+- Decision: If scheduling is not implemented, remove the service for now and create a plan for it later
+
+**Test expectation:** none — decision unit.
+
+**Verification:**
+- Clear decision documented in this plan
+
+---
+
+- [ ] U2. **Apply fix**
+
+**Goal:** Make docker-compose.yml valid.
+
+**Requirements:** R1
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `docker-compose.yml`
+
+**Approach:**
+- Option A: Remove scheduler service (if scheduling is deferred)
+- Option B: Create minimal scheduler.py (if scheduling is desired now)
+
+**Test scenarios:**
+- Integration: `docker-compose config` validates without error
+- Happy path: `docker-compose up motief` works (existing service unchanged)
+
+**Verification:**
+- `docker-compose config` passes
+- No reference to missing scheduler.py
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Removing service breaks someone's workflow | Only remove if confirmed unused; otherwise create stub |
+
+## Sources & References
+
+- `docker-compose.yml`
+- `pyproject.toml` (schedule dependency)
--- a/docs/plans/2026-04-24-003-consolidate-config-sources-plan.md
+++ b/docs/plans/2026-04-24-003-consolidate-config-sources-plan.md
@ -0,0 +1,136 @@
+---
+title: "refactor: Consolidate duplicate config sources"
+type: refactor
+status: active
+date: 2026-04-24
+---
+
+# Consolidate Duplicate Config Sources
+
+## Overview
+
+There are two config files: `config.py` (51 lines at repo root) and `analysis/config.py` (13K). The root config defines base `Config` dataclass with env vars; analysis/config.py contains SVD themes, party lists, colors, and explorer constants. This divergence is confusing and risks stale data.
+
+## Problem Frame
+
+- Two sources of truth for configuration
+- `config.py` is small and may be overlooked
+- `analysis/config.py` is large and contains both constants and dynamic config
+- Risk of updating one but not the other
+
+## Requirements Trace
+
+- R1. Single canonical config module
+- R2. All existing imports continue to work (backward compatibility)
+- R3. No behavior changes
+- R4. Tests pass after consolidation
+
+## Scope Boundaries
+
+**Included:**
+- Audit both config files
+- Decide on canonical location
+- Migrate root config into analysis/config.py or re-export
+- Update imports
+
+**Excluded:**
+- Changing config values
+- Adding new config options
+- Refactoring analysis/config.py beyond import consolidation
+
+## Key Technical Decisions
+
+- **Canonical location: analysis/config.py** — it already contains most config and is imported by many modules
+- **Backward compatibility:** Root `config.py` becomes a thin re-export shim: `from analysis.config import Config`
+
+## Implementation Units
+
+- [ ] U1. **Audit config usage**
+
+**Goal:** Map which modules import from which config file.
+
+**Requirements:** R1
+
+**Dependencies:** None
+
+**Files:**
+- Read: `config.py`, `analysis/config.py`
+
+**Approach:**
+- `grep -rn "from config import\|import config" --include="*.py"`
+- `grep -rn "from analysis.config import\|import analysis.config" --include="*.py"`
+- Document findings
+
+**Test expectation:** none — research unit.
+
+**Verification:**
+- Complete list of import sites
+
+---
+
+- [ ] U2. **Migrate root config into analysis/config.py**
+
+**Goal:** Move Config dataclass and env var logic to analysis/config.py.
+
+**Requirements:** R1, R2, R3
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `analysis/config.py`
+- Modify: `config.py` (re-export shim)
+
+**Approach:**
+- Move `Config` dataclass to analysis/config.py
+- Keep root `config.py` as: `from analysis.config import Config`
+- Ensure no circular imports
+
+**Execution note:** Test-first — write a test that imports both `config` and `analysis.config` and verifies they expose the same `Config` class.
+
+**Test scenarios:**
+- Happy path: `from config import Config` still works
+- Happy path: `from analysis.config import Config` works
+- Integration: Both paths return the same object
+
+**Verification:**
+- `uv run python -c "from config import Config; from analysis.config import Config as AC; assert Config is AC"`
+- All tests pass
+
+---
+
+- [ ] U3. **Update import sites**
+
+**Goal:** Standardize imports to use analysis/config.py directly.
+
+**Requirements:** R1
+
+**Dependencies:** U2
+
+**Files:**
+- Modify: Files that import from root config.py
+
+**Approach:**
+- Replace `from config import Config` with `from analysis.config import Config`
+- Mechanical change, one file at a time
+
+**Test scenarios:**
+- Integration: All modified files import successfully
+- Regression: All tests pass
+
+**Verification:**
+- `grep -rn "from config import" --include="*.py"` returns nothing (except shim)
+- Full test suite passes
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Circular imports | analysis/config.py must not import from modules that import it |
+| Hidden dynamic imports | Search thoroughly; test all import paths |
+
+## Sources & References
+
+- `config.py`
+- `analysis/config.py`
--- a/docs/plans/2026-04-24-004-rewrite-readme-plan.md
+++ b/docs/plans/2026-04-24-004-rewrite-readme-plan.md
@ -0,0 +1,97 @@
+---
+title: "docs: Rewrite README.md with quickstart and project overview"
+type: feat
+status: active
+date: 2026-04-24
+---
+
+# Rewrite README.md
+
+## Overview
+
+The current README.md is 22 lines and only covers embeddings and Ansible deployment. It does not explain what Stemwijzer is, how to run it locally, or how to run the pipeline. New contributors must discover ARCHITECTURE.md to get oriented.
+
+## Problem Frame
+
+- README is the first file a visitor sees
+- No quickstart instructions
+- No mention of Streamlit, the voting UI, or the explorer
+- No screenshot or demo link
+- Missing prerequisites (Python 3.13, uv, DuckDB)
+
+## Requirements Trace
+
+- R1. Explain what the project does in 2 sentences
+- R2. Show a screenshot or demo link
+- R3. List prerequisites and installation steps
+- R4. Provide quickstart commands (run app, run pipeline, run tests)
+- R5. Link to ARCHITECTURE.md for deep dive
+- R6. Link to docs/ for additional documentation
+
+## Scope Boundaries
+
+**Included:**
+- Rewrite README.md with new structure
+
+**Excluded:**
+- Changing ARCHITECTURE.md (only link to it)
+- Adding screenshots (placeholder path accepted)
+- Creating a demo deployment
+
+## Key Technical Decisions
+
+- **Keep it concise** — README should be scannable in 2 minutes. Deep content lives in ARCHITECTURE.md.
+- **Use the same commands as ARCHITECTURE.md** — single source of truth for commands
+- **Match the project's language** — Dutch UI, English docs
+
+## Implementation Units
+
+- [ ] U1. **Draft and review README structure**
+
+**Goal:** Create a README that a new contributor can follow to get the app running in <10 minutes.
+
+**Requirements:** R1–R6
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `README.md`
+
+**Approach:**
+Structure:
+1. Title + one-line description
+2. Screenshot placeholder
+3. What is Stemwijzer? (2 sentences)
+4. Features bullet list (voting compass, explorer, analytics)
+5. Prerequisites (Python 3.13, uv)
+6. Quickstart (clone, uv sync, run Streamlit, run pipeline)
+7. Testing (uv run pytest)
+8. Project structure (brief, link to ARCHITECTURE.md)
+9. Documentation links (ARCHITECTURE.md, docs/)
+10. License
+
+**Test expectation:** none — documentation-only change. Verify by reading the rendered markdown.
+
+**Verification:**
+- A new contributor can follow the quickstart without reading other files
+- All commands in README match ARCHITECTURE.md
+- No broken internal links
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| README grows too long | Cap at ~80 lines; defer deep content to ARCHITECTURE.md |
+| Commands become outdated | Cross-check against ARCHITECTURE.md before finalizing |
+
+## Documentation / Operational Notes
+
+- This is the documentation change. No other docs need updating.
+
+## Sources & References
+
+- Existing README: `README.md`
+- Deep docs: `ARCHITECTURE.md`
+- Code style: `CODE_STYLE.md`
--- a/docs/plans/2026-04-24-005-add-pyright-ci-plan.md
+++ b/docs/plans/2026-04-24-005-add-pyright-ci-plan.md
@ -0,0 +1,100 @@
+---
+title: "feat: Add pyright type-checking to CI"
+type: feat
+status: active
+date: 2026-04-24
+---
+
+# Add pyright Type-Checking to CI
+
+## Overview
+
+`pyright` is in dev dependencies but never runs in CI. Adding it to the pytest workflow (or as a separate job) would catch type errors before merge.
+
+## Problem Frame
+
+- Type errors are only caught locally (if the developer runs pyright)
+- No enforcement of type annotations in PRs
+- CODE_STYLE.md encourages typing but CI doesn't verify
+
+## Requirements Trace
+
+- R1. pyright runs on every push/PR
+- R2. pyright uses the same version as pyproject.toml
+- R3. CI fails on type errors
+- R4. Initial run establishes baseline (no new errors introduced)
+
+## Scope Boundaries
+
+**Included:**
+- Add pyright step to CI workflow
+- Fix or suppress any existing type errors that block CI
+
+**Excluded:**
+- Adding type annotations to untyped code (do that incrementally)
+- Changing pyright configuration beyond CI setup
+
+## Implementation Units
+
+- [ ] U1. **Add pyright CI job**
+
+**Goal:** Run pyright in GitHub Actions.
+
+**Requirements:** R1, R2, R3
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `.github/workflows/pytest.yml`
+
+**Approach:**
+- Add a `pyright` job parallel to pytest
+- Use `uv run pyright` (same version as local)
+
+**Test scenarios:**
+- Happy path: Typed code passes pyright
+- Error path: Type error fails the CI job
+- Integration: pyright version matches pyproject.toml
+
+**Verification:**
+- CI runs pyright successfully
+
+---
+
+- [ ] U2. **Establish baseline**
+
+**Goal:** Ensure CI passes on current code.
+
+**Requirements:** R4
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: Files with fixable type errors
+- Modify: `pyproject.toml` (add suppressions for unfixable legacy issues)
+
+**Approach:**
+- Run `uv run pyright` locally
+- Fix trivial errors; suppress complex legacy ones with `# type: ignore` or pyrightconfig
+- Document suppressions
+
+**Test scenarios:**
+- Happy path: `uv run pyright` exits 0
+
+**Verification:**
+- `uv run pyright` passes locally
+- CI pyright job passes
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Many existing type errors | Fix batch-by-batch; don't block this PR on full cleanup |
+| pyright is slow in CI | Run in parallel with pytest; cache node_modules |
+
+## Sources & References
+
+- `pyproject.toml` dev dependencies
+- `.github/workflows/pytest.yml` (from P1-001)
--- a/docs/plans/2026-04-24-006-activate-pre-commit-hooks-plan.md
+++ b/docs/plans/2026-04-24-006-activate-pre-commit-hooks-plan.md
@ -0,0 +1,146 @@
+---
+title: "feat: Activate pre-commit hooks (black, ruff, isort)"
+type: feat
+status: active
+date: 2026-04-24
+---
+
+# Activate Pre-commit Hooks
+
+## Overview
+
+`.pre-commit-config.yaml` exists but is explicitly disabled ("intentionally minimal and does not enable hooks by installing them"). Activating black, ruff, and isort would enforce CODE_STYLE.md conventions automatically and eliminate style-only review comments.
+
+## Problem Frame
+
+- Code style is documented in CODE_STYLE.md but not enforced automatically
+- Contributors may submit PRs with inconsistent formatting
+- Review time is spent on style nits instead of logic
+- No CI check for formatting violations
+
+## Requirements Trace
+
+- R1. Pre-commit hooks run black, ruff, and isort
+- R2. Hooks are enforced in CI (fail build on violations)
+- R3. Hooks use the same versions as pyproject.toml dev dependencies
+- R4. Initial run reformats existing code without breaking tests
+
+## Scope Boundaries
+
+**Included:**
+- Update `.pre-commit-config.yaml`
+- Add CI workflow step for pre-commit
+- Run initial format across codebase
+
+**Excluded:**
+- Adding new linters or rules
+- Changing CODE_STYLE.md conventions
+- Fixing logic bugs found by ruff (separate PR)
+
+## Key Technical Decisions
+
+- **Use pre-commit.ci or GitHub Action** — pre-commit.ci is zero-config but may not work on Gitea. Use a GitHub Actions step as fallback.
+- **Single large format commit** — Run once, commit formatting changes separately from config changes so reviewers can see the diff.
+- **Skip tests during format** — Formatting should not change behavior, but run tests after to verify.
+
+## Implementation Units
+
+- [ ] U1. **Update .pre-commit-config.yaml**
+
+**Goal:** Enable black, ruff, and isort with versions matching pyproject.toml.
+
+**Requirements:** R1, R3
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `.pre-commit-config.yaml`
+
+**Approach:**
+- Remove the "does not enable hooks" comment
+- Add repos for black, ruff, isort with pinned versions
+- Set ruff to match CODE_STYLE.md rules
+- Configure isort profile (black-compatible)
+
+**Test scenarios:**
+- Happy path: `pre-commit run --all-files` completes successfully
+- Error path: A file with style violations fails the hook
+- Integration: Versions match pyproject.toml dev deps
+
+**Verification:**
+- `pre-commit run --all-files` runs without config errors
+
+---
+
+- [ ] U2. **Add pre-commit CI step**
+
+**Goal:** Block PRs that violate formatting rules.
+
+**Requirements:** R2
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `.github/workflows/pytest.yml` (or create separate lint.yml)
+
+**Approach:**
+- Add a job that runs `pre-commit run --all-files`
+- Use the same uv setup as the pytest workflow
+- Install pre-commit via uv
+
+**Test scenarios:**
+- Happy path: Clean code passes pre-commit CI
+- Error path: Violations fail the CI job
+
+**Verification:**
+- Pushing a formatting violation fails the check
+- Pushing clean code passes
+
+---
+
+- [ ] U3. **Run initial format across codebase**
+
+**Goal:** Bring all existing code into compliance so future PRs only touch their own changes.
+
+**Requirements:** R4
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: All Python files (mechanical reformatting)
+
+**Approach:**
+- Run `pre-commit run --all-files`
+- Commit formatting changes separately
+- Run full test suite: `uv run pytest tests/ -q`
+
+**Execution note:** This is a mechanical change. Characterization tests should pass unchanged. If tests fail, the formatter broke something — investigate before committing.
+
+**Test scenarios:**
+- Integration: All existing tests pass after formatting
+- Edge case: No logic changes introduced by formatting
+
+**Verification:**
+- `uv run pytest tests/ -q` passes
+- `git diff` shows only whitespace/import changes
+
+---
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Massive format commit obscures git blame | Use `.git-blame-ignore-revs` to ignore the format commit |
+| Ruff finds existing logic issues | Fix or suppress in separate PR; don't mix with activation |
+| Contributors without pre-commit installed | CI catches it; add setup note to README |
+
+## Documentation / Operational Notes
+
+- Add pre-commit setup to README quickstart
+- Document `.git-blame-ignore-revs` usage
+
+## Sources & References
+
+- Config: `.pre-commit-config.yaml`
+- Style guide: `CODE_STYLE.md`
+- Dependencies: `pyproject.toml`
--- a/docs/plans/2026-04-24-007-replace-print-with-logging-plan.md
+++ b/docs/plans/2026-04-24-007-replace-print-with-logging-plan.md
@ -0,0 +1,256 @@
+---
+title: "refactor: Replace print() calls with structured logging"
+type: refactor
+status: active
+date: 2026-04-24
+---
+
+# Replace print() with Structured Logging
+
+## Overview
+
+There are approximately 225 `print()` calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.
+
+## Problem Frame
+
+- `print()` outputs are invisible in production logs or mixed with Streamlit UI
+- No log levels (INFO, WARNING, ERROR) to filter noise
+- No module names to identify which component logged what
+- Ingestion and API errors are silently swallowed by broad except blocks
+- Scripts produce unstructured output that is hard to parse or aggregate
+
+## Requirements Trace
+
+- R1. Replace all `print()` calls with appropriate `logging` levels
+- R2. Configure a project-wide logger with module-level naming
+- R3. Preserve existing output behavior in Streamlit contexts (use `st.info`/`st.warning` where appropriate)
+- R4. Update CODE_STYLE.md to mandate logging over print
+- R5. All tests pass after migration
+
+## Scope Boundaries
+
+**Included:**
+- database.py, api_client.py, summarizer.py, ai_provider.py
+- pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
+- scripts/ (batch migration, one script at a time)
+
+**Excluded:**
+- explorer.py Streamlit UI prints (these may be intentional UI feedback)
+- app.py user-facing prints
+- Third-party code
+
+## Key Technical Decisions
+
+- **Use standard library `logging`** — no external dependency needed. If structlog is desired later, it wraps logging.
+- **Module-level loggers** — `logger = logging.getLogger(__name__)` pattern
+- **Root config in config.py** — basicConfig or dictConfig at app startup
+- **Streamlit compatibility** — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with `st.*` calls
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `database.py` — prints in insert/update paths, ~50+ prints
+- `api_client.py` — prints in fetch/pagination logic
+- `scripts/` — 22 scripts, many with progress prints
+- `CODE_STYLE.md` — already recommends structured logging
+
+### Institutional Learnings
+
+- `docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md` — mechanical changes should be verified with full test suite
+
+## Implementation Units
+
+- [ ] U1. **Set up logging configuration and test harness**
+
+**Goal:** Create the logging infrastructure and tests before touching any print statements.
+
+**Requirements:** R2
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `config.py`
+- Create: `tests/test_logging_config.py`
+
+**Approach:**
+- Add `configure_logging(level=logging.INFO)` to config.py
+- Use standard format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
+- Create test that verifies logger hierarchy and formatting
+
+**Execution note:** Test-first — write `test_logging_config.py` before any implementation.
+
+**Test scenarios:**
+- Happy path: `configure_logging()` sets up root logger with correct format
+- Happy path: Module logger `logging.getLogger("database")` inherits level
+- Edge case: Calling configure_logging twice is idempotent
+
+**Verification:**
+- `uv run pytest tests/test_logging_config.py -v` passes
+
+---
+
+- [ ] U2. **Migrate database.py prints to logging**
+
+**Goal:** Replace all print() calls in database.py with logger calls.
+
+**Requirements:** R1, R5
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `database.py`
+- Modify: `tests/test_database_audit.py` (if it checks output)
+
+**Approach:**
+- Add `logger = logging.getLogger(__name__)` at module level
+- Replace progress prints with `logger.info()`
+- Replace error/warning prints with `logger.warning()` / `logger.error()`
+- Keep behavior identical (same messages)
+
+**Execution note:** Test-first — write a test that asserts `caplog` captures a database log message before changing any code.
+
+**Test scenarios:**
+- Happy path: `caplog` captures `logger.info` during motion insert
+- Error path: `caplog` captures `logger.error` on DB failure
+- Edge case: No prints leak to stdout (use capsys to verify)
+
+**Verification:**
+- `grep -n "print(" database.py` returns nothing (or only intentional UI prints)
+- `uv run pytest tests/test_database_audit.py -v` passes
+
+---
+
+- [ ] U3. **Migrate api_client.py prints to logging**
+
+**Goal:** Replace all print() calls in api_client.py with logger calls.
+
+**Requirements:** R1, R5
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `api_client.py`
+- Modify: `tests/test_api_client.py` (create if missing)
+
+**Approach:**
+- Same pattern as U2: module logger, map prints to levels
+- API pagination progress → `logger.info`
+- Rate limit / retry messages → `logger.warning`
+
+**Execution note:** Test-first — characterize current behavior with a capsys test, then migrate.
+
+**Test scenarios:**
+- Happy path: API fetch logs pagination progress at INFO level
+- Error path: Failed request logs at ERROR level
+- Integration: Log output includes module name (`api_client`)
+
+**Verification:**
+- `grep -n "print(" api_client.py` returns nothing
+- Existing API tests pass
+
+---
+
+- [ ] U4. **Migrate pipeline modules**
+
+**Goal:** Replace prints in pipeline/ with logging.
+
+**Requirements:** R1, R5
+
+**Dependencies:** U1, U2 (for database.py patterns to follow)
+
+**Files:**
+- Modify: `pipeline/run_pipeline.py`, `pipeline/svd_pipeline.py`, `pipeline/text_pipeline.py`, `pipeline/fusion.py`
+
+**Approach:**
+- Batch migration of 4 files
+- Progress bars / step completion → `logger.info`
+- Warnings about missing data → `logger.warning`
+
+**Test scenarios:**
+- Happy path: Pipeline run emits structured logs for each stage
+- Error path: Missing embeddings logged at WARNING, not silently skipped
+
+**Verification:**
+- `grep -rn "print(" pipeline/` returns nothing
+- Pipeline tests pass
+
+---
+
+- [ ] U5. **Migrate scripts/ batch**
+
+**Goal:** Replace prints in scripts/ with logging.
+
+**Requirements:** R1, R5
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `scripts/*.py` (batch, mechanical)
+
+**Approach:**
+- Script-level loggers: `logger = logging.getLogger("scripts.drift_analysis")`
+- CLI progress prints → `logger.info`
+- Results summary prints → `logger.info` (or keep as print if they are actual CLI output)
+
+**Execution note:** Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep `print(json.dumps(result))` style outputs.
+
+**Test scenarios:**
+- Happy path: Script progress is logged, result output is preserved
+- Edge case: Scripts that parse their own output still work
+
+**Verification:**
+- Scripts that produce machine-readable output still do so
+- `uv run pytest tests/scripts/ -q` passes
+
+---
+
+- [ ] U6. **Update CODE_STYLE.md and add lint rule**
+
+**Goal:** Prevent new print() calls from being introduced.
+
+**Requirements:** R4
+
+**Dependencies:** U1–U5
+
+**Files:**
+- Modify: `CODE_STYLE.md`
+- Modify: `.pre-commit-config.yaml` (add ruff rule for print)
+
+**Approach:**
+- Add "Use logging, not print" section to CODE_STYLE.md
+- Add ruff rule: `T201` (print found) to enforce
+
+**Test expectation:** none — documentation and config change.
+
+**Verification:**
+- `ruff check .` fails if any new print() is added
+
+---
+
+## System-Wide Impact
+
+- **Interaction graph:** All modules that previously printed to stdout now use logging handlers
+- **Error propagation:** Logging does not change exception flow, but error messages are now timestamped and leveled
+- **State lifecycle risks:** None — logging is side-effect-only
+- **Unchanged invariants:** All existing behavior preserved; only output channel changes
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Missing a print() call | Use `grep -rn "print(" --include="*.py"` as final check |
+| Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to `st.info` first |
+| Tests that assert on stdout break | Update to use `caplog` fixture |
+| Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints |
+
+## Documentation / Operational Notes
+
+- Update CODE_STYLE.md logging section
+- Consider adding a logging configuration section to ARCHITECTURE.md
+
+## Sources & References
+
+- CODE_STYLE.md logging guidance
+- Python logging docs: https://docs.python.org/3/library/logging.html
+- Existing prints: `grep -rn "print(" --include="*.py" .`
--- a/docs/plans/2026-04-24-009-pipeline-health-checks-plan.md
+++ b/docs/plans/2026-04-24-009-pipeline-health-checks-plan.md
@ -0,0 +1,264 @@
+---
+title: "feat: Pipeline health checks and observability"
+type: feat
+status: active
+date: 2026-04-24
+---
+
+# Pipeline Health Checks and Observability
+
+## Overview
+
+There is no automated way to verify pipeline health. A broken API client, stale embeddings, or an SVD axis flip could go unnoticed until a user reports it. A health check script plus a lightweight dashboard would surface problems proactively.
+
+## Problem Frame
+
+- No visibility into whether the last pipeline run succeeded
+- No alerting when motion count drops unexpectedly
+- No detection when SVD components flip or drift
+- No visibility into embedding coverage (% of motions with embeddings)
+- LLM enrichment failures are silent (motions just lack layman_explanation)
+
+## Requirements Trace
+
+- R1. Health check script verifies: API reachable, DB has recent motions, embeddings cover >X% of motions
+- R2. Health check detects SVD stability (no sudden axis flips)
+- R3. Health check reports missing layman_explanations
+- R4. Optional: Streamlit page or API endpoint showing health metrics
+- R5. All health checks are testable and tested
+
+## Scope Boundaries
+
+**Included:**
+- Health check module with individual check functions
+- CLI runner for health checks
+- Tests for each check
+- Optional Streamlit health page
+
+**Excluded:**
+- Real alerting (PagerDuty, Slack) — just script exit codes for now
+- Long-term metrics storage (Prometheus, etc.)
+- Fixing the issues the health check finds
+
+## Key Technical Decisions
+
+- **Pure functions for checks** — Each check is a function that takes DB/config and returns (status, message, details). This makes them testable without side effects.
+- **Composable runner** — A runner executes all checks and aggregates results into a report.
+- **Exit codes** — 0 = all healthy, 1 = any warning, 2 = any critical. Suitable for cron/CI.
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `pipeline/run_pipeline.py` — orchestrates all pipeline stages
+- `database.py` — DB queries for motion counts, embeddings, vote counts
+- `analysis/svd_labels.py` — SVD component stability logic
+- `scripts/` — existing diagnostic scripts (drift analysis, etc.)
+
+### Institutional Learnings
+
+- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md` — diagnostic scripts can produce false alarms if they don't verify against canonical DB state
+- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md` — metrics must be derived from canonical sources, not hardcoded
+
+## Implementation Units
+
+- [ ] U1. **Create health check core module**
+
+**Goal:** Define the check interface and runner.
+
+**Requirements:** R1–R3 foundation
+
+**Dependencies:** None
+
+**Files:**
+- Create: `health/__init__.py`
+- Create: `health/core.py`
+- Create: `health/checks.py`
+- Create: `tests/test_health_core.py`
+
+**Approach:**
+- `HealthStatus` enum: OK, WARNING, CRITICAL
+- `HealthCheck` dataclass: name, status, message, details
+- `run_checks(checks)` → `HealthReport` with aggregate status
+- `check_*` functions are pure: accept data, return HealthCheck
+
+**Execution note:** Test-first — write `test_health_core.py` with failing tests for the interface before implementing.
+
+**Test scenarios:**
+- Happy path: All OK checks → report status OK
+- Error path: One CRITICAL check → report status CRITICAL
+- Edge case: Empty check list → report status OK
+- Integration: Check function signature is pure (no DB access in core)
+
+**Verification:**
+- `uv run pytest tests/test_health_core.py -v` passes
+
+---
+
+- [ ] U2. **Implement data freshness checks**
+
+**Goal:** Verify the DB has recent motions and votes.
+
+**Requirements:** R1
+
+**Dependencies:** U1
+
+**Files:**
+- Modify: `health/checks.py`
+- Create: `tests/test_health_checks.py`
+
+**Approach:**
+- `check_motion_freshness(db, max_age_days=7)` — count motions newer than threshold
+- `check_vote_coverage(db)` — % of motions with votes
+- `check_embedding_coverage(db, min_coverage=0.95)` — % of motions with fused embeddings
+
+**Execution note:** Test-first — use mocked DB or test fixtures with known data.
+
+**Test scenarios:**
+- Happy path: Recent motions exist, coverage > 95% → OK
+- Warning path: Motions are 10 days old → WARNING
+- Critical path: No motions in last 30 days → CRITICAL
+- Edge case: Empty database → CRITICAL with clear message
+
+**Verification:**
+- Tests pass with mocked database
+- Manual run against real DB produces accurate report
+
+---
+
+- [ ] U3. **Implement SVD stability check**
+
+**Goal:** Detect if SVD components have flipped or drifted significantly.
+
+**Requirements:** R2
+
+**Dependencies:** U1, U2
+
+**Files:**
+- Modify: `health/checks.py`
+- Modify: `tests/test_health_checks.py`
+
+**Approach:**
+- `check_svd_stability(db, reference_themes)` — compare current SVD_THEMES to canonical config
+- `check_axis_flip(db)` — verify right-wing parties are on the right side (reuse existing validation logic)
+- Use `analysis/config.py` SVD_THEMES as canonical reference
+
+**Execution note:** Test-first — mock the DB to return known SVD components and test flip detection.
+
+**Test scenarios:**
+- Happy path: SVD components match canonical themes → OK
+- Warning path: Minor label drift → WARNING
+- Critical path: Axis flip detected (right-wing parties on left) → CRITICAL
+- Edge case: No SVD data in DB → CRITICAL
+
+**Verification:**
+- Tests pass
+- Manual verification against real DB confirms no false alarms
+
+---
+
+- [ ] U4. **Implement LLM enrichment check**
+
+**Goal:** Surface motions missing layman explanations.
+
+**Requirements:** R3
+
+**Dependencies:** U1, U2
+
+**Files:**
+- Modify: `health/checks.py`
+- Modify: `tests/test_health_checks.py`
+
+**Approach:**
+- `check_llm_coverage(db, max_missing=100)` — count motions without layman_explanation
+- `check_llm_quality(db)` — spot-check a sample of explanations for non-empty, reasonable length
+
+**Test scenarios:**
+- Happy path: <5% missing explanations → OK
+- Warning path: 5–15% missing → WARNING
+- Critical path: >15% missing → CRITICAL
+- Edge case: All explanations are empty strings → WARNING
+
+**Verification:**
+- Tests pass with mocked data
+
+---
+
+- [ ] U5. **Create CLI runner**
+
+**Goal:** Run all checks from command line with appropriate exit codes.
+
+**Requirements:** R1–R4
+
+**Dependencies:** U1–U4
+
+**Files:**
+- Create: `scripts/health_check.py`
+- Create: `tests/scripts/test_health_check.py`
+
+**Approach:**
+- `python scripts/health_check.py` → prints report, exits 0/1/2
+- Optional flags: `--check motion-freshness`, `--format json`, `--threshold-days 7`
+
+**Test scenarios:**
+- Happy path: All OK → exit 0, human-readable output
+- Error path: One warning → exit 1
+- Critical path: One critical → exit 2
+- Edge case: JSON format outputs valid JSON
+
+**Verification:**
+- `uv run python scripts/health_check.py` runs without error
+- Exit codes match expectations
+
+---
+
+- [ ] U6. **Add Streamlit health page (optional)**
+
+**Goal:** Visual health dashboard in the app.
+
+**Requirements:** R4
+
+**Dependencies:** U1–U5
+
+**Files:**
+- Create: `pages/3_Health.py`
+
+**Approach:**
+- Run all checks on page load
+- Display: overall status, motion count, embedding coverage, SVD status, LLM coverage
+- Use `st.metric` for key numbers
+- Color-code: green/yellow/red
+
+**Test expectation:** none — Streamlit page, tested manually.
+
+**Verification:**
+- Page loads without error
+- Metrics update when DB changes
+
+---
+
+## System-Wide Impact
+
+- **Interaction graph:** Health checks read from DB but do not write. Safe to run concurrently with pipeline.
+- **Error propagation:** Check failures are captured in report, not raised as exceptions.
+- **Unchanged invariants:** No changes to pipeline, DB schema, or UI behavior.
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| False alarms (like trajectories diagnostic) | Verify against canonical DB state, not intermediary artifacts |
+| Slow checks on large DB | Add query timeouts; cache results |
+| Check drift from codebase changes | Health checks are tested; tests fail if logic breaks |
+
+## Documentation / Operational Notes
+
+- Add health check to deployment runbook (run before/after pipeline)
+- Consider scheduling in CI or cron
+
+## Sources & References
+
+- `docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md`
+- `analysis/config.py` — canonical SVD themes
+- `database.py` — DB schema and queries
+- `docs/solutions/best-practices/blog-numbers-from-pipeline-outputs-2026-04-16.md`
--- a/docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md
+++ b/docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md
@ -0,0 +1,131 @@
+# Stemwijzer Improvement Roadmap
+
+**Created:** 2026-04-24
+**Status:** Active
+
+This roadmap captures 17 improvement opportunities identified during a codebase review, organized into 5 phases by dependency and risk. Each item links to a detailed implementation plan (created separately) when available.
+
+---
+
+## Phase 1: Foundation (CI/CD, Config, Documentation)
+
+*Prerequisite for everything else. Low risk, high developer-experience impact.*
+
+| # | Improvement | Priority | Effort | Plan | TDD |
+|---|------------|----------|--------|------|-----|
+| 1 | Fix broken CI test workflow (mindmodel-schedule.yml references missing requirements.txt) | High | Small | P1-001 | Yes |
+| 2 | Fix docker-compose.yml (missing scheduler.py) | High | Small | P1-002 | No (config) |
+| 3 | Consolidate duplicate config sources (config.py vs analysis/config.py) | Medium | Small | P1-003 | Yes |
+| 4 | Rewrite README.md (22 lines → proper quickstart) | High | Small | P1-004 | No (docs) |
+| 5 | Add pyright type-checking to CI | Medium | Small | P1-005 | Yes |
+| 6 | Activate pre-commit hooks (black, ruff, isort) | Medium | Small | P1-006 | Yes |
+
+**Phase goal:** Reliable CI, clean config, and onboarding docs that don't require discovering ARCHITECTURE.md.
+
+---
+
+## Phase 2: Code Quality (Logging, Error Handling, Import Safety)
+
+*Builds on Phase 1 CI. Makes the codebase maintainable and production-ready.*
+
+| # | Improvement | Priority | Effort | Plan | TDD |
+|---|------------|----------|--------|------|-----|
+| 7 | Replace ~225 print() calls with structured logging | Medium | Medium | P2-001 | Yes |
+| 8 | Fix broad `except Exception:` blocks in database.py and api_client.py | Medium | Medium | P2-002 | Yes |
+| 9 | Complete import-safe module guards (extend existing work) | Medium | Medium | — | Yes |
+
+**Phase goal:** Observable, debuggable production behavior with clear error propagation.
+
+---
+
+## Phase 3: Architecture (Decompose explorer.py)
+
+*Already partially completed. Remaining work is decoupling Streamlit from tab logic.*
+
+| # | Improvement | Priority | Effort | Plan | TDD |
+|---|------------|----------|--------|------|-----|
+| 10 | Complete explorer.py decomposition (extract tab logic from Streamlit) | Medium | Large | P3-001 | Yes |
+
+**Status:** Constants extracted to analysis/config.py, placeholder tab modules created. Remaining: move build_*_tab functions out of explorer.py while preserving @st.cache_data decorators.
+
+**Phase goal:** explorer.py under 1500 lines, tab modules independently testable.
+
+---
+
+## Phase 4: New Features
+
+*User-facing value. Depends on Phase 2 for observability and Phase 3 for clean architecture.*
+
+| # | Improvement | Priority | Effort | Plan | TDD |
+|---|------------|----------|--------|------|-----|
+| 11 | REST API layer (read-only, FastAPI) | Low | Large | P4-001 | Yes |
+| 12 | Automated pipeline scheduling (real scheduler.py) | Medium | Medium | P4-002 | Yes |
+| 13 | Motion recommendation engine | Low | Medium | P4-003 | Yes |
+| 14 | Export user voting profile (JSON/CSV/shareable image) | Low | Small | P4-004 | Yes |
+| 15 | Data quality dashboard (Streamlit page or API) | Medium | Medium | P4-005 | Yes |
+
+**Phase goal:** External API consumers, automated data freshness, and user engagement features.
+
+---
+
+## Phase 5: Observability & Robustness
+
+*Production confidence. Can run in parallel with Phase 4.*
+
+| # | Improvement | Priority | Effort | Plan | TDD |
+|---|------------|----------|--------|------|-----|
+| 16 | Add Sentry or error tracking | Low | Small | P5-001 | No (config) |
+| 17 | Pipeline health checks / alerting script | Medium | Medium | P5-002 | Yes |
+| 18 | Benchmark suite (pytest-benchmark for SVD/fusion) | Low | Small | P5-003 | Yes |
+
+**Phase goal:** Know when things break before users do; detect performance regressions.
+
+---
+
+## Dependency Graph
+
+```
+Phase 1 (Foundation)
+  ├─→ Phase 2 (Code Quality) ─┬─→ Phase 3 (Architecture)
+  │                           │        └─→ Phase 4 (Features)
+  │                           └──────────────→ Phase 4 (Features)
+  └─→ Phase 5 (Observability) ───────────────→
+```
+
+Phase 1 must come first. Phase 2 makes Phase 3/4 safer. Phase 3 unlocks some Phase 4 items. Phase 5 is largely independent.
+
+---
+
+## Recommended Execution Order
+
+**Sprint 1:** Items 1, 2, 4, 6 (CI + docs + pre-commit)
+**Sprint 2:** Items 5, 7, 8 (type checking + logging + errors)
+**Sprint 3:** Item 10 (explorer decomposition)
+**Sprint 4:** Items 12, 15, 17 (pipeline automation + health checks)
+**Sprint 5+:** Items 11, 13, 14, 16, 18 (API + features + observability)
+
+---
+
+## Plan Document Inventory
+
+| Plan ID | File | Status |
+|---------|------|--------|
+| P1-001 | docs/plans/2026-04-24-001-fix-ci-test-workflow-plan.md | Planned |
+| P1-002 | docs/plans/2026-04-24-002-fix-docker-compose-scheduler-plan.md | Planned |
+| P1-003 | docs/plans/2026-04-24-003-consolidate-config-sources-plan.md | Planned |
+| P1-004 | docs/plans/2026-04-24-004-rewrite-readme-plan.md | Planned |
+| P1-005 | docs/plans/2026-04-24-005-add-pyright-ci-plan.md | Planned |
+| P1-006 | docs/plans/2026-04-24-006-activate-pre-commit-hooks-plan.md | Planned |
+| P2-001 | docs/plans/2026-04-24-007-replace-print-with-logging-plan.md | Planned |
+| P2-002 | docs/plans/2026-04-24-008-fix-broad-exception-handling-plan.md | Planned |
+| P3-001 | docs/plans/2026-04-04-003-refactor-complete-explorer-decomposition-plan.md | In progress |
+| P4-005 | docs/plans/2026-04-24-009-pipeline-health-checks-plan.md | Planned |
+| P5-002 | docs/plans/2026-04-24-010-pipeline-health-checks-plan.md | Planned |
+
+---
+
+## Notes
+
+- All implementation plans use TDD (test-first) for code-bearing units.
+- Config-only units (README, docker-compose fix) skip TDD but include verification checklists.
+- Existing plans (e.g., explorer decomposition) are referenced rather than duplicated.
--- a/docs/research/bipartisan_anchor_extremity.png
+++ b/docs/research/bipartisan_anchor_extremity.png
--- a/docs/research/cross_temporal_drift.png
+++ b/docs/research/cross_temporal_drift.png
--- a/docs/research/cross_temporal_policy_extremity.png
+++ b/docs/research/cross_temporal_policy_extremity.png
--- a/docs/research/llm-motion-classification.md
+++ b/docs/research/llm-motion-classification.md
@ -0,0 +1,181 @@
+# Motion Extremity Classification with LLMs
+
+## Implementation Status
+
+**Script**: `scripts/classify_motions.py` - Ready to run
+
+**Requirements**:
+- Valid OpenRouter API key in `.env` (current key returns "User not found")
+- ~28,000 motions to classify
+
+**Usage**:
+```bash
+# Classify all motions (will take hours)
+.venv/bin/python scripts/classify_motions.py --delay 0.5
+
+# Test with small sample first
+.venv/bin/python scripts/classify_motions.py --limit 10 --delay 2
+
+# Analyze existing classifications
+.venv/bin/python scripts/classify_motions.py --analyze-only
+```
+
+## Why LLMs?
+
+Rule-based keyword matching is too crude:
+- Only captures 3-4% as "high extremity"
+- Can't understand nuance ("verbod" appears in mundane contexts)
+- Can't assess policy impact magnitude
+
+LLMs can:
+- Understand policy context and implications
+- Assess deviation from consensus/norms
+- Interpret Dutch political terminology
+
+## Proposed LLM Classification Schema
+
+### Output Format
+```json
+{
+  "extremity_score": 1-5,
+  "policy_domain": "migration|identity|economy|social|climate|foreign_policy|justice|education|health|other",
+  "policy_direction": "restrictive|permissive|neutral",
+  "deviation_type": "procedural|semantic|structural",
+  "consensus_level": "broad|partial|narrow|opposition",
+  "rationale": "1-2 sentence explanation"
+}
+```
+
+### Extremity Scale (1-5)
+
+| Score | Label | Description | Examples |
+|-------|-------|-------------|----------|
+| 1 | Mainstream | Standard governance, routine | Budget adjustments, procedural changes |
+| 2 | Minor deviation | Small policy tweaks within consensus | Minor fee changes, small program adjustments |
+| 3 | Moderate deviation | Meaningful but within coalition consensus | Immigration processing changes, targeted regulations |
+| 4 | Major deviation | Challenges status quo meaningfully | Tighter migration rules, significant policy reversals |
+| 5 | Extreme | Fundamental/populist, outside consensus | Complete bans, anti-democratic motions |
+
+### Policy Direction
+
+- **restrictive**: Limits freedoms, tightens rules, reduces access
+- **permissive**: Expands freedoms, loosens rules, increases access  
+- **neutral**: Procedural, administrative, technical
+
+### Consensus Level
+
+- **broad**: Passed with 80%+ parties voting same way
+- **partial**: Passed with 60-80% agreement
+- **narrow**: Passed with 50-60% (close vote)
+- **opposition**: Coalition parties voted against
+
+## LLM Prompt
+
+```
+SYSTEM:
+You are an expert on Dutch parliamentary politics. Classify parliamentary motions 
+on policy extremity using the provided schema.
+
+CLASSIFICATION_RUBRIC:
+- Score 1 (Mainstream): Routine governance, budget adjustments, procedural changes
+- Score 2 (Minor): Small policy tweaks within consensus
+- Score 3 (Moderate): Meaningful changes but within coalition consensus
+- Score 4 (Major): Challenges status quo, significant policy shifts
+- Score 5 (Extreme): Fundamental changes, populist, outside consensus
+
+Consider:
+- Policy impact magnitude
+- Deviation from current norms/policies
+- Coalition/opposition dynamics
+- Dutch political context
+
+USER:
+Classify this motion:
+
+Title: {title}
+Description: {description}
+Voting result: {passed/rejected}, {party_coalition} parties voted for
+
+Respond in JSON format.
+```
+
+## Batch Processing Strategy
+
+```python
+import json
+import asyncio
+from openai import AsyncOpenAI
+
+async def classify_motion_batch(motions: list[dict], model: str = "gpt-4o") -> list[dict]:
+    """Process motions in parallel batches."""
+    
+    client = AsyncOpenAI()
+    
+    async def classify_one(motion: dict) -> dict:
+        prompt = build_prompt(motion)
+        
+        response = await client.chat.completions.create(
+            model=model,
+            messages=[{"role": "system", "content": SYSTEM_PROMPT},
+                    {"role": "user", "content": prompt}],
+            response_format={"type": "json_object"}
+        )
+        
+        result = json.loads(response.choices[0].message.content)
+        result["motion_id"] = motion["id"]
+        return result
+    
+    # Process 50 in parallel
+    results = []
+    for i in range(0, len(motions), 50):
+        batch = motions[i:i+50]
+        batch_results = await asyncio.gather(*[classify_one(m) for m in batch])
+        results.extend(batch_results)
+    
+    return results
+
+async def main():
+    motions = load_motions()  # Load from database
+    classifications = await classify_motion_batch(motions)
+    save_to_database(classifications)
+
+asyncio.run(main())
+```
+
+## Cost Estimate
+
+| Dataset Size | Model | Est. Cost | Est. Time |
+|-------------|-------|-----------|-----------|
+| 35,000 motions | gpt-4o-mini | ~$5-10 | 30-60 min |
+| 35,000 motions | gpt-4o | ~$50-100 | 2-4 hours |
+
+Using `gpt-4o-mini` is sufficient for classification tasks.
+
+## Analysis After Classification
+
+Once classified, we can analyze:
+
+```python
+# Extremity by period
+df.groupby(['period', 'extremity_score']).size().unstack(fill_value=0)
+
+# Domain-Extremity heatmap
+pivot = df.pivot_table(values='motion_id', 
+                        index='policy_domain', 
+                        columns='extremity_score', 
+                        aggfunc='count')
+
+# Passed vs rejected extremity
+df.groupby('passed')['extremity_score'].mean()
+
+# Coalition shift analysis
+df[df['policy_domain'] == 'migration'].groupby(['period', 'policy_direction']).size()
+```
+
+## Expected Insights
+
+1. **Extremity distribution over time** - Has 4-5 score increased?
+2. **Domain-extremity correlation** - Which domains produce extreme policies?
+3. **Direction-extremity** - Restrictive vs permissive extremity by period
+4. **Consensus-extremity** - Are extreme policies passing with broad or narrow consensus?
+5. **Coalition voting** - Which parties support extreme policies?
--- a/docs/research/mainstream_shift.png
+++ b/docs/research/mainstream_shift.png
--- a/docs/research/motion-classification-prompt-v2.md
+++ b/docs/research/motion-classification-prompt-v2.md
@ -0,0 +1,136 @@
+# Motion Classification Prompt - v2
+
+## Design Principles
+
+1. **Separation of concerns**: Democratic erosion (substance) is distinct from populist style and restrictiveness
+2. **Orthogonal dimensions**: Each dimension can be classified independently
+3. **Clear boundaries**: Defined transitions between levels
+4. **Dutch political context**: Accounts for EU, referenda, institutional attacks
+
+## Refined Prompt
+
+```python
+SYSTEM_PROMPT = """Je bent een expert in Nederlandse parlementaire politiek en democratische normen.
+
+Classificeer Kamermoties op vier onafhankelijke dimensies:
+
+---
+
+### 1. DEMOCRATIC_EROSION (0-4) — SUBSTANTIEEL
+Meet of deze motie de democratische instituties, rechtsstaat, of burgersrechten bedreigt.
+
+| Score | Label | Beschrijving | Voorbeelden |
+|-------|-------|-------------|-------------|
+| 0 | None | Geen impact op democratische normen | Begroting, procedureel, technische wijzigingen |
+| 1 | Minor | Kleine afwijking van gebruikelijke processen | Kleine uitzonderingen op transparantie-eisen |
+| 2 | Moderate | Betekenisvolle beleidswijziging, maar binnen constitutioneel kader | Verandering asielprocedures, strengere veiligheidsmaatregelen |
+| 3 | Significant | Vraagt om fundamentele verandering in checks & balances | Beperking rechterlijke toetsing, afschaffen referendum |
+| 4 | Critical | Ondermijnt openbaar bestuur, rechtsstaat, of universele rechten | Afschaffing persvrijheid, discriminatie bij wet, anti-EU obstructionisme |
+
+**Beslisregels:**
+- Score 4 ALLEEN bij: (a) directe aanval op persvrijheid/rechterlijke macht, OF (b) systematische discriminatie in wetgeving, OF (c) oproep tot schending internationale verdragen
+- Score 3 bij: (a) referendum afschaffen/herroepen, OF (b) EU-samenwerking fundamenteel ter discussie stellen, OF (c) bevoegdheden uitvoerende macht significant uitbreiden zonder tegenwicht
+- Score 2 is default voor significante beleidswijzigingen die niet bovenstaande raken
+
+---
+
+### 2. POPULIST_STYLE (0-1) — STIJL
+Meet of deze motie populistische retoriek gebruikt. Dit is onafhankelijk van de democratische impact.
+
+| Score | Label | Beschrijving |
+|-------|-------|-------------|
+| 0 | Normal | Zakelijke, institutionele toon |
+| 1 | Populist | Gebruikt anti-establishment framing |
+
+**Indicatoren voor score 1:**
+- "Het volk" vs "de elite"/"de Haag"/"de politiek"
+- "Wij vs zij" framing ("burgers vs bestuurders")
+- Suggestie dat "gewone mensen" anders behandeld moeten worden
+- Vragen om "direct door het volk" zonder institutionele checks
+- Emotioneel geladen taalgebruik over "de problemen van gewone mensen"
+
+**Let op:** Partijpolitieke kritiek is normaal. Alleen extreem anti-institutionele framing telt.
+
+---
+
+### 3. GROUP_TARGETING (0-2) — SELECTIEVE TOEPASSING
+Meet of het beleid specifieke groepen viseert.
+
+| Score | Label | Beschrijving |
+|-------|-------|-------------|
+| 0 | Universal | Algemeen beleid, geen specifieke groep |
+| 1 | Indirect | Algemeen beleid dat onevenredig groepen raakt |
+| 2 | Direct | Expliciet gericht op specifieke bevolkingsgroep |
+
+**Score 2 voorbeelden:**
+- "Asielzoekers" / "illegalen" specifiek viseren
+- "Moslims" / specifieke religieuze groepen
+- "Linkse" of "rechtse" politieke tegenstanders bij naam
+- "Etnische minderheden" als doelwit
+
+**Score 1 voorbeelden:**
+- Algemeen immigratiebeleid dat effectief migranten raakt
+- Veiligheidsmaatregelen die marginaliseerde groepen disproportioneel raken
+
+---
+
+### 4. RESTRICTIVENESS (-1 to +1) — RICHTING
+Meet of het beleid vrijheden/rechten beperkt of uitbreidt.
+
+| Score | Label | Beschrijving |
+|-------|-------|-------------|
+| -1 | Expansive | Breidt vrijheden of toegang uit |
+| 0 | Neutral | Geen directe impact op vrijheden |
+| +1 | Restrictive | Beperkt vrijheden, toegang, of rechten |
+
+**Let op:** Budgettaire of procedurele zaken zijn meestal 0.
+
+---
+
+## OUTPUT FORMAT
+
+Respond in JSON:
+{
+  "democratic_erosion": 0-4,
+  "populist_style": 0-1,
+  "group_targeting": 0-2,
+  "restrictiveness": -1 to 1,
+  "domain": "migration|economy|climate|social|justice|foreign|education|health|other",
+  "rationale": "1-2 zinnen uitleg"
+}
+
+---
+
+## BELANGRIJKE BESLISREGELS
+
+1. **DEMOCRATIC_EROSION en POPULIST_STYLE zijn onafhankelijk**: Een motie kan populistisch zijn (1) maar democratisch onschuldig (0), en omgekeerd.
+
+2. **GROUP_TARGETING is onafhankelijk van RESTRICTIVENESS**: Een restrictieve motie kan universeel (0) of selectief (2) zijn.
+
+3. **EU-afwijkingen gradueren**: 
+   - "Nederlandse invulling van EU-beleid" = score 0-1 erosion
+   - "Nexit/EU verlaten" = score 3-4 erosion
+   - "EU-regels overtreden" = score 2-3 erosion
+
+4. **Referendum-context**: Afschaffen referendum = score 3. Bestaand referendum gebruiken = score 0.
+
+5. **Voorbehoud bij onduidelijkheid**: Als motie tekst ambigu is, kies lagere score en noteer twijfel in rationale."""
+```
+
+## Summary of Changes
+
+| Old | New |
+|-----|-----|
+| Single EXTREMITY_SCORE (1-5) conflating substance+style | Four orthogonal dimensions |
+| "Populistische retoriek" as score 5 criterion | POPULIST_STYLE (0-1), independent of erosion |
+| Vague score boundaries | Defined decision rules with examples |
+| TARGETED_GROUP redundant with score | GROUP_TARGETING (0-2), orthogonal to restrictiveness |
+| EU deviation = score 5 | Graduated EU scores (0-4) with specific examples |
+| Missing referendum/Nexit | Explicit scoring for these patterns |
+
+## What This Enables
+
+1. **Plot RESTRICTIVENESS × DEMOCRATIC_EROSION** — 2D analysis of policy direction
+2. **Track POPULIST_STYLE over time** — Is rhetoric getting more populist?
+3. **Analyze GROUP_TARGETING** — Is group-specific targeting increasing?
+4. **Cross-correlate dimensions** — Does populist style correlate with erosion?
--- a/docs/research/normalized_extremity_trend.png
+++ b/docs/research/normalized_extremity_trend.png
--- a/docs/research/polarization_comprehensive.png
+++ b/docs/research/polarization_comprehensive.png
--- a/docs/research/polarization_summary.png
+++ b/docs/research/polarization_summary.png
--- a/docs/research/policy_extremity_2018_anchor.png
+++ b/docs/research/policy_extremity_2018_anchor.png
--- a/docs/research/policy_extremity_analysis.png
+++ b/docs/research/policy_extremity_analysis.png
--- a/docs/research/voting_vs_policy_extremity.png
+++ b/docs/research/voting_vs_policy_extremity.png
--- a/docs/solutions/best-practices/mindmodel-generation-anti-patterns.md
+++ b/docs/solutions/best-practices/mindmodel-generation-anti-patterns.md
@ -0,0 +1,168 @@
+---
+title: Critical Anti-Patterns Discovered During Mindmodel Generation
+date: 2026-04-12
+category: docs/solutions/best-practices
+module: stemwijzer
+problem_type: best_practice
+component: documentation
+severity: critical
+applies_when:
+  - When adding logging to any module
+  - When working with Streamlit test isolation
+  - When generating or updating .mindmodel/ for this project
+tags: [anti-patterns, logging, streamlit, mindmodel, code-quality]
+---
+
+# Critical Anti-Patterns Discovered During Mindmodel Generation
+
+## Context
+
+During a comprehensive mindmodel generation session (Phase 1: 7 parallel analysis agents, Phase 2: constraint-writer assembly), several critical anti-patterns were discovered and documented in `.mindmodel/anti-patterns/anti-patterns.yaml`. This document captures the key findings for future reference.
+
+## Guidance
+
+### 1. Use Logging, Not Print Statements
+
+**CRITICAL**: `api_client.py` uses `print()` instead of logging throughout (11 instances).
+
+**Broken pattern:**
+```python
+# api_client.py - BAD
+print(f"Fetched {len(voting_records)} voting records from API")
+print(f"Error fetching motions from API: {e}")  # No traceback
+```
+
+**Correct pattern:**
+```python
+# GOOD - use logging throughout
+import logging
+
+_logger = logging.getLogger(__name__)
+
+def get_motions(self, ...):
+    try:
+        _logger.info("Fetched %d voting records from API", len(voting_records))
+    except Exception as e:
+        _logger.exception("Error fetching motions from API: %s", e)
+        return []
+```
+
+### 2. Streamlit Global State Replacement
+
+**CRITICAL**: `explorer.py` has module-level `st = _DummySt()` which shadows Streamlit globally.
+
+**Broken pattern:**
+```python
+# explorer.py - BAD
+try:
+    import plotly.express as px
+except Exception:
+    class _DummySt:
+        figure = _DummyFigure
+        # ...
+    st = _DummySt()  # Global replacement - affects all imports!
+```
+
+**Correct pattern:**
+```python
+# GOOD - use conditional flags
+try:
+    import plotly.express as px
+    import plotly.graph_objects as go
+    HAS_PLOTLY = True
+except ImportError:
+    HAS_PLOTLY = False
+    px = None
+    go = None
+
+def render_chart(data):
+    if not HAS_PLOTLY:
+        _logger.warning("Plotly not available")
+        return
+    # ... rest of chart logic
+```
+
+### 3. Logger Naming Inconsistency
+
+**WARNING**: 33 files split between `logger = logging.getLogger(__name__)` and `_logger = logging.getLogger(__name__)`.
+
+Files with `logger` (16): api_client.py, ai_provider.py, pipeline files, analysis files
+Files with `_logger` (17): database.py, explorer.py, explorer_helpers.py
+
+**Recommendation**: Standardize on `_logger` for module-level loggers. Update CODE_STYLE.md to explicitly state the convention.
+
+### 4. Bare Except with Pass
+
+**CRITICAL**: `database.py` line 47 has bare `except: pass` that catches KeyboardInterrupt, SystemExit, MemoryError.
+
+**Broken pattern:**
+```python
+# database.py line 47 - BAD
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except:  # Catches EVERYTHING
+    pass
+```
+
+**Correct pattern:**
+```python
+# GOOD
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except Exception as exc:
+    _logger.debug("Sequence creation skipped: %s", exc)
+```
+
+## Why This Matters
+
+1. **Logging over Print**: Structured logging enables log aggregation, filtering by level, and includes stack traces. Print statements are invisible in production and provide no context during failures.
+
+2. **Global State**: Module-level replacements of standard library modules cause subtle bugs where code imports work differently depending on import order.
+
+3. **Consistency**: Mixed logger naming makes code harder to grep and grep-replace. Pick one convention and enforce it via linting.
+
+4. **Bare Except**: Catching all exceptions including `KeyboardInterrupt` and `SystemExit` can prevent graceful shutdown and mask serious issues.
+
+## When to Apply
+
+- Before committing any logging changes: ensure using `_logger`, not `print()`
+- When adding optional dependency handling: use flags, not global replacements
+- When updating CODE_STYLE.md: add explicit logger naming convention
+- When updating .mindmodel/: verify anti-patterns section is current
+
+## Examples
+
+### Fixing api_client.py Logging
+
+```python
+# Before (broken)
+print(f"Processed {count} motions")
+
+# After (correct)
+_logger.info("Processed %d motions", count)
+```
+
+### Fixing Exception Handling
+
+```python
+# Before (broken)
+try:
+    risky_operation()
+except:
+    pass
+
+# After (correct)
+try:
+    risky_operation()
+except Exception as exc:
+    _logger.warning("Operation failed: %s", exc)
+    return safe_fallback
+```
+
+## Related
+
+- `.mindmodel/anti-patterns/anti-patterns.yaml` - Full anti-pattern documentation
+- `.mindmodel/constraints/logging.yaml` - Logging conventions
+- `.mindmodel/constraints/error-handling.yaml` - Error handling patterns
+- `CODE_STYLE.md` - Code style guide (needs update for logger naming)
+- `AGENTS.md` - Project conventions (RIGHT-wing parties on RIGHT, SVD labels = voting patterns)
--- a/docs/solutions/build-errors/uv-lock-pytest-missing-source-2026-04-05.md
+++ b/docs/solutions/build-errors/uv-lock-pytest-missing-source-2026-04-05.md
@ -0,0 +1,72 @@
+---
+title: "uv.lock parse error due to pytest entry missing source"
+module: tooling
+component: tooling
+problem_type: build_error
+severity: medium
+date: 2026-04-05
+tags: [uv, lockfile, pytest, packaging, streamlit]
+---
+
+Problem
+-------
+
+Running `uv` commands failed with a parse error in `uv.lock` caused by an ambiguous/malformed `pytest` entry that lacked a proper `source` field and conflicted with another package entry.
+
+Symptoms
+--------
+
+- `uv run streamlit run Home.py` failed with: "Dependency `pytest` has missing `source` field but has more than one matching package".
+- `uv lock` and `uv add` also failed because `uv.lock` could not be parsed.
+
+What didn't work
+----------------
+
+- Attempting `uv add "pytest>=9.0.2" --dev` failed because the lockfile parser errored before the command could modify anything.
+- Manual edits to `uv.lock` were used as a temporary stop-gap (allowed `uv` to run) but are not a durable solution because `uv.lock` is generated.
+
+Solution
+--------
+
+1. Regenerate the lockfile from `pyproject.toml` so `uv.lock` and project metadata are consistent:
+
+   - Run: `uv lock`
+   - Inspect the resulting `uv.lock` to ensure `pytest` appears as a single `[[package]]` entry with a `source` field and expected hashes.
+
+2. Commit the regenerated lock locally (do not push without review):
+
+   - `git add uv.lock`
+   - `git commit -m "chore: regenerate uv.lock (resolve pytest source ambiguity)"`
+
+Why this works
+--------------
+
+- `uv.lock` is the canonical, generated lockfile. The parser expects each package entry to have an unambiguous `source` so `uv` can resolve hashes and reproducible installs. Regenerating produces a consistent lockfile derived from `pyproject.toml` and resolves duplicated/malformed entries.
+- Manual edits fix symptoms but can be overwritten or lead to inconsistent state; regenerating ensures upstream metadata and lockfile match the resolver's expectations.
+
+Prevention
+----------
+
+- Avoid hand-editing `uv.lock`. When a lockfile parse error appears, prefer regenerating with `uv lock`.
+- Add a lightweight CI check to ensure `uv lock --check` (or `uv lock` with no changes) passes before merging changes that touch dependencies or the lockfile.
+- Make `pytest` (and other dev tools) authoritative in `pyproject.toml` under `dependency-groups.dev` so the resolver has a single source of truth.
+
+Verification
+------------
+
+- After regeneration, verify `uv` commands work and tests run:
+
+  - `uv run streamlit run Home.py` → Streamlit should start and print Local/Network URL
+  - `.venv/bin/python -m pytest tests/ -q` → Confirm tests run (example in this run: `171 passed, 2 skipped`).
+
+Related files
+-------------
+
+- `uv.lock`
+- `pyproject.toml`
+
+If you want, I can:
+
+1) Run the Streamlit verification now, or
+2) Propose a small CI job snippet to enforce `uv lock --check`, or
+3) Create a short PR description if you want this committed change pushed and opened as a PR.
--- a/docs/solutions/insights/llm-motion-classification-prompt-design.md
+++ b/docs/solutions/insights/llm-motion-classification-prompt-design.md
@ -0,0 +1,125 @@
+---
+module: llm-classification
+tags: [polarization, nlp, prompt-design, democratic-norms]
+problem_type: classification-schema-design
+date: 2026-04-05
+reviewed_by:
+  - correctness-reviewer
+  - domain-expert (Dutch politics)
+  - clarity-reviewer
+---
+
+# LLM Motion Classification: Prompt Design Lessons
+
+## Problem
+
+Wanted to classify 28,000 Dutch parliamentary motions by "extremity" to measure polarization over time.
+
+Initial prompt conflated multiple concepts:
+- Democratic norm erosion
+- Populist rhetoric style  
+- Group targeting
+- Restrictiveness vs permissiveness
+
+## Initial v1 Design (Flawed)
+
+```python
+EXTREMITY_SCORE (1-5):
+  - 1: Mainstream
+  - 5: "Undermines checks & balances, threatens rule of law, 
+        discriminates groups, populist rhetoric"
+```
+
+**Problems identified:**
+1. Populist rhetoric is style, not substance — shouldn't be in same score as democratic erosion
+2. "Extreme" undefined — compared to what baseline?
+3. Score 4/5 boundary unclear
+4. TARGETED_GROUP redundant with EXTREMITY_SCORE
+5. EU deviation always = score 5 (too broad)
+6. Missing Dutch-specific patterns (Nexit, referendum abolition)
+
+## Refined v2 Design (Four Orthogonal Dimensions)
+
+### 1. DEMOCRATIC_EROSION (0-4) — Substance only
+| Score | Label | Criteria |
+|-------|-------|----------|
+| 0 | None | No impact on democratic norms |
+| 1 | Minor | Small procedural deviations |
+| 2 | Moderate | Significant policy change, within constitutional framework |
+| 3 | Significant | Fundamental change to checks & balances |
+| 4 | Critical | Undermines rule of law, press freedom, systematic discrimination |
+
+**Decision rules:**
+- Score 4 ONLY if: (a) direct attack on judiciary/press, OR (b) systematic discrimination in law, OR (c) call to violate international treaties
+- Score 3 if: (a) abolish referendum, OR (b) fundamentally question EU cooperation, OR (c) significantly expand executive powers
+
+### 2. POPULIST_STYLE (0-1) — Style only
+Independent of democratic impact. A motion can be populist (1) but democratic (0).
+
+**Indicators:**
+- "Het volk" vs "de elite/den Haag"
+- "Wij vs zij" framing
+- Call for "direct democracy" without checks
+- Emotionally charged language
+
+### 3. GROUP_TARGETING (0-2) — Targeting only
+| Score | Label |
+|-------|-------|
+| 0 | Universal — general policy |
+| 1 | Indirect — general policy that disproportionately affects groups |
+| 2 | Direct — explicitly targets specific population group |
+
+### 4. RESTRICTIVENESS (-1 to +1) — Direction only
+| Score | Label |
+|-------|-------|
+| -1 | Expansive |
+| 0 | Neutral |
+| +1 | Restrictive |
+
+## Key Lessons Learned
+
+### 1. Separate Style from Substance
+Populist rhetoric ≠ democratic erosion. A mainstream party using strong language isn't anti-democratic. Conflating them causes false positives.
+
+### 2. Make Dimensions Orthogonal
+- DEMOCRATIC_EROSION × RESTRICTIVENESS: A policy can be erosive AND restrictive, or erosive AND permissive
+- POPULIST_STYLE × DEMOCRATIC_EROSION: Can have populist (1) with democratic (0), and vice versa
+- GROUP_TARGETING × RESTRICTIVENESS: Restrictive ≠ targeted (and vice versa)
+
+### 3. Add Decision Rules for Boundaries
+Vague transitions ("significant" → "critical") cause inconsistency. Define specific triggers:
+```
+Score 4 ONLY when: (a) OR (b) OR (c)
+Score 3 when: (a) OR (b) OR (c)
+```
+
+### 4. Gradate EU Deviation
+Not all EU deviation is equal:
+- Dutch implementation of EU policy → erosion 0-1
+- Nexit / leave EU → erosion 3-4
+- Violate EU rules → erosion 2-3
+
+### 5. Include Domain-Specific Patterns
+Dutch context matters:
+- Referendum abolition = score 3
+- "Den Haag" / "establishment" attacks = check for populist style
+- Nexit = score 3-4 depending on framing
+
+### 6. Define Reference Baselines
+"Abnormal" compared to what?
+- 2016 consensus
+- EU norms
+- Historical Dutch practice
+- International standards
+
+## Testing Recommendations
+
+1. **Calibration set**: 50 motions with expert annotations before production
+2. **Boundary cases**: Test score 3/4 transitions explicitly
+3. **Cross-rater reliability**: Multiple classifiers on same motions
+4. **Domain-specific test cases**: Migration, EU, constitutional reform
+
+## Files
+
+- `scripts/classify_motions.py` — Implementation with v2 prompt
+- `docs/research/motion-classification-prompt-v2.md` — Full prompt documentation
--- a/docs/solutions/insights/policy-extremity-vs-voting-extremity.md
+++ b/docs/solutions/insights/policy-extremity-vs-voting-extremity.md
@ -0,0 +1,80 @@
+---
+title: "Policy Extremity vs Voting Extremity: Independent Phenomena"
+date: 2026-04-05
+module: analysis
+problem_type: research
+component: motion-analysis
+tags: [polarization, policy-extremity, voting-extremity, svd, embedding-norm]
+---
+
+# Policy Extremity vs Voting Extremity: Independent Phenomena
+
+## Key Finding
+
+**Voting extremity** (how divided parliament is) and **policy extremity** (how far motions are from political center) are **independent phenomena** with opposite trends:
+
+| Measure | 2016 | 2026 | Trend |
+|---------|------|------|-------|
+| **Voting Extremity** | 0.70 | 0.46 | More divided |
+| **Policy Extremity** | 9.0 | 4.2 | Less extreme |
+
+**Correlation: r = -0.011** (essentially zero)
+
+## Definitions
+
+### Voting Extremity
+- **Formula**: margin / total votes
+- **Interpretation**: How divided parliament is
+  - 1.0 = unanimous (all votes same direction)
+  - 0.0 = perfectly split (50-50)
+- **Trend**: Increased (more close votes in recent years)
+
+### Policy Extremity
+- **Formula**: L2 norm of SVD embedding vector
+- **Interpretation**: How "far out" a motion is in political semantic space
+- **Trend**: Decreased (motions closer to political center)
+
+## Analysis
+
+### Why Are They Independent?
+
+1. **Voting extremity** captures **how parties divide** on issues
+2. **Policy extremity** captures **where motions sit** in policy space
+
+A motion can be:
+- Near the center (low policy extremity) but divide parties 50-50 (high voting extremity)
+- Far from center (high policy extremity) but pass unanimously (low voting extremity)
+
+### Historical Pattern
+
+- **2016**: Coalition passed "extreme" motions (legislative proposals) with consensus
+- **2026**: More divided votes on "moderate" motions (procedural/administrative)
+
+### Interpretation
+
+The parliament has become **more divided in how it votes**, but the **policies being passed are actually less extreme** in semantic space.
+
+This suggests:
+- The polarization is about **different issues** dividing parties
+- The "extremes" that pass are now closer to mainstream positions
+- What changed is **what divides parties**, not **how radical the policies are**
+
+## Visualization
+
+See `docs/research/voting_vs_policy_extremity.png`
+
+## Methodology
+
+```python
+# Voting extremity = margin / total
+voting_extremity = abs(votes_for - votes_against) / total_votes
+
+# Policy extremity = L2 norm of SVD embedding
+policy_extremity = np.linalg.norm(embedding_vector)
+```
+
+## Conclusion
+
+These findings confirm that **voting extremity ≠ policy extremity**. They capture different aspects of parliamentary behavior and should be analyzed separately.
+
+The increase in voting extremity reflects genuine polarization in parliamentary divisions. But the decrease in policy extremity suggests that the policies actually being passed are not more radical—they're just more contested.
--- a/docs/solutions/insights/quantifying-political-extremity.md
+++ b/docs/solutions/insights/quantifying-political-extremity.md
@ -0,0 +1,146 @@
+---
+title: "Quantifying Political Extremity: Voting vs Policy"
+date: 2026-04-05
+module: analysis
+problem_type: research
+component: motion-analysis
+tags: [polarization, voting-extremity, policy-extremity, embedding-analysis, parliamentary-motion]
+---
+
+# Quantifying Political Extremity: Voting vs Policy
+
+## Context
+
+Initial analysis of parliamentary motions sought to measure polarization by examining how "extreme" policies have become. The hypothesis was that extremes on both sides became more extreme. The analysis revealed this hypothesis was incorrect — and surfaced two independent phenomena.
+
+## Guidance
+
+### Key Finding: Two Independent Measures of Extremity
+
+**Voting Extremity** and **Policy Extremity** are independent phenomena with different trends:
+
+| Measure | 2016 | 2026 | Trend |
+|---------|------|------|-------|
+| **Voting Extremity** (margin/total) | 0.70 | 0.46 | Parliament votes more closely |
+| **Policy Extremity** (embedding distance from mainstream) | 5.65 | 4.17 | Policies are less extreme |
+
+**Correlation: r ≈ 0** — these measures are statistically independent.
+
+### What Each Measures
+
+**Voting Extremity** = `abs(votes_for - votes_against) / total_votes`
+- 1.0 = unanimous (all votes same direction)
+- 0.0 = perfectly split (50-50)
+- Captures how divided parliament is when voting
+
+**Policy Extremity** = `||embedding - mainstream_centroid||`
+- Euclidean distance in text embedding space (2560 dims)
+- Captures how far a motion is from the political center
+
+### How to Measure Each
+
+```python
+# Voting Extremity
+margin = abs(votes_for - votes_against)
+total = votes_for + votes_against
+voting_extremity = margin / total
+
+# Policy Extremity (using text embeddings, not SVD)
+from embeddings table (qwen/qwen3-embedding-4b)
+policy_extremity = np.linalg.norm(motion_embedding - mainstream_centroid)
+```
+
+### Why Use Text Embeddings (Not SVD)
+
+SVD embeddings are fitted on **voting patterns**, capturing how parties vote together. They measure **voting extremity**, not **policy extremity**.
+
+For policy content, use **raw text embeddings** (`embeddings` table, 2560 dimensions) which are computed from motion text only.
+
+### Bipartisan Anchor Approach
+
+Define the "mainstream" as the centroid of bipartisan motions (80%+ parties vote the same way):
+
+```python
+# Find bipartisan motions
+bipartisan = [m for m in motions if majority_vote_pct >= 0.80]
+
+# Compute mainstream centroid
+mainstream_centroid = mean([m.embedding for m in bipartisan])
+
+# Measure policy extremity
+policy_extremity = ||motion.embedding - mainstream_centroid||
+```
+
+## Why This Matters
+
+The hypothesis "extremes became more extreme" was wrong because:
+
+1. **Voting extremity increased** — parliament votes more divided now
+2. **Policy extremity decreased** — even extreme motions are closer to center
+
+This means: what divides parties changed, not how radical the policies are.
+
+## Quantifying Mainstream Shift
+
+Using 2018 as baseline ("last normal year"):
+
+| Period | Distance from 2018 | Interpretation |
+|--------|-------------------|----------------|
+| 2016-2018 | ~0.22 | Similar mainstream |
+| **2019** | **0.46** | Shift begins |
+| 2020-2026 | **~0.71** | New stable mainstream |
+
+The mainstream shifted **0.71 units** after 2018 and has remained stable.
+
+### Coalition Shift on Migration Policy
+
+Parties that once opposed strict migration now vote for them:
+
+| Party | 2016-2018 | 2025-2026 | Change |
+|-------|------------|------------|--------|
+| VVD | 100% voor | 78% voor | ↓ |
+| CDA | 100% voor | 81% voor | ↓ |
+| D66 | 100% voor | 60% voor | ↓↓ |
+| PVV | 20% voor | 56% voor | ↑↑ |
+| NSC | 0% (new) | 56% voor | new |
+| BBB | 0% (new) | 79% voor | new |
+
+## When to Apply
+
+- When analyzing parliamentary polarization trends
+- When comparing policy extremity across time periods
+- When studying coalition formation and party positioning
+- When testing hypotheses about political extremism
+
+## Examples
+
+### Correct Analysis
+```python
+# Compare voting extremity and policy extremity separately
+voting_ext = compute_voting_margin(motion)
+policy_ext = compute_embedding_distance(motion, mainstream_centroid)
+
+# Plot both trends independently
+plot_trend(years, voting_ext, label="Voting Extremity")
+plot_trend(years, policy_ext, label="Policy Extremity")
+```
+
+### Incorrect Analysis
+```python
+# DON'T use SVD scores to measure policy extremity
+svd_score = motion.svd_vector[0]  # This measures voting pattern, not content!
+
+# DO use text embeddings for policy content
+text_embedding = embeddings_table[motion.id]
+```
+
+## Related Findings
+
+- `svd-stability-vs-overtone-shift.md` — SVD axes measure voting structure, not semantics
+- `policy-extremity-vs-voting-extremity.md` — Initial documentation of the distinction
+
+## Visualizations
+
+- `docs/research/polarization_comprehensive.png` — Combined view of all metrics
+- `docs/research/mainstream_shift.png` — Mainstream shift over time
+- `docs/research/voting_vs_policy_extremity.png` — Independent trends
--- a/docs/solutions/logic-errors/svd-active-mp-filter-missing-2026-04-16.md
+++ b/docs/solutions/logic-errors/svd-active-mp-filter-missing-2026-04-16.md
@ -0,0 +1,90 @@
+---
+module: svd
+date: 2026-04-16
+category: docs/solutions/logic-errors
+problem_type: logic_error
+component: rails_view
+severity: medium
+symptoms:
+  - "SVD tab displayed VVD component 1 score of 0.108"
+  - "Compass displayed VVD component 1 score of 0.335"
+  - "Significant numerical discrepancy between two views showing same data"
+root_cause: scope_issue
+resolution_type: code_fix
+tags:
+  - svd
+  - voting-analysis
+  - filter-scope
+  - party-scores
+---
+
+# SVD Tab Party Scores Don't Match Compass
+
+## Problem
+
+The SVD tab displayed VVD component 1 score as 0.108 while the compass visualization showed 0.335 — a 3x discrepancy caused by including inactive/historical MPs in the SVD calculation.
+
+## Symptoms
+
+- **SVD tab**: VVD comp1 score = 0.108 (incorrect)
+- **Compass**: VVD comp1 score = 0.335 (correct, verified against compass reference)
+- **Discrepancy**: ~3x difference between views
+- **Scope**: Affects all parties but most visible for VVD (82 historical MPs vs ~50 active)
+
+## What Didn't Work
+
+N/A — straightforward fix once root cause identified through comparison with compass implementation at `explorer.py:1473`.
+
+## Solution
+
+The `_get_aligned_party_scores()` function in `views/svd.py` was missing an `active_MP` filter when calculating party means for the current parliament window.
+
+**Before (buggy code):**
+
+```python
+def _get_aligned_party_scores(party_id: str, dimension: str, ...) -> list:
+    raw_scores = execute_query(score_query, ...)
+    # Missing: no active_MP filter
+    return raw_scores
+```
+
+**After (fixed code):**
+
+```python
+def get_aligned_party_scores(party_id: str, dimension: str, ...) -> list:
+    raw_scores = execute_query(score_query, ...)
+
+    # Filter to only active MPs (matches compass behavior at explorer.py:1473)
+    active_mps = {m[0] for m in active_mp_query if m[0] is not None}
+    scores = [s for s in raw_scores if s[0] in active_mps]
+
+    return scores
+```
+
+Key changes:
+1. Extracted function to module-level for testability
+2. Added active MP filtering using the same query pattern as compass (`explorer.py:1473`)
+3. Filter ensures only MPs in current parliament window are included
+
+**Verification:**
+- Without filter: VVD comp1 = 0.1083
+- With filter: VVD comp1 = 0.3366 (matches compass reference of 0.3350)
+- Test suite: 169/169 tests passing
+
+## Why This Works
+
+The root cause was including all 82 historical VVD MPs instead of only the active ones. The database (`data/motions.db`) contains MPs from multiple parliaments, and the `_get_aligned_party_scores()` function wasn't filtering by `active_MP`. The compass correctly applied this filter, explaining the discrepancy.
+
+## Prevention
+
+1. **Test suite**: Comprehensive tests in `tests/svd_test.py` covering alignment calculations with active MP filtering
+2. **Cross-view validation**: Compare SVD and compass scores for each party — assert values match within tolerance
+3. **Query pattern documentation**: All score queries must include `active_MP` filter when calculating party means
+4. **Code review checklist**: Require active_MP filter for any new score calculation queries
+5. **Automated regression**: Add CI check that runs comparison between SVD tab and compass for all parties
+
+## Related Issues
+
+- `docs/solutions/logic-errors/svd-theme-divergence-from-party-positions.md` — Related domain issue: SVD scores not matching actual party positions
+- `docs/solutions/logic-errors/svd-component-labels-mismatch.md` — Related theme: Labels/data alignment mismatches
+- `docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.md` — Core principle: SVD captures voting patterns, verify against actual voting data
--- a/docs/solutions/ui-bugs/svd-axis-pole-labels-incorrect-after-flip.md
+++ b/docs/solutions/ui-bugs/svd-axis-pole-labels-incorrect-after-flip.md
@ -0,0 +1,121 @@
+---
+title: "SVD axis labels: derive left/right from runtime flip, not static fields"
+date: 2026-04-12
+module: analysis
+problem_type: ui_bug
+component: analysis
+symptoms:
+  - "SVD axis labels showed wrong orientation for components where runtime flip differed from static flip value"
+  - "Right-wing parties (PVV, FVD) appeared on the LEFT side of axes despite being canonical right parties"
+  - "Components 3-10 in tijdtraject view showed scores incomparable with single-window view"
+root_cause: logic_error
+resolution_type: code_fix
+severity: high
+tags:
+  - svd
+  - axis-labels
+  - pole-labels
+  - parliamentary-explorer
+  - left-right-axis
+  - procrustes
+---
+
+# SVD Axis Labels: Derive Left/Right from Runtime Flip, Not Static Fields
+
+## Problem
+SVD axis pole labels showed wrong orientation after the runtime flip mechanism was applied. Right-wing parties appeared on the LEFT side of axes despite being canonical right parties. Additionally, components 3-10 in the tijdtraject (time trajectory) view showed party scores that were incomparable with the single-window view.
+
+## Symptoms
+- Axis labels like "← PVV en FVD — soevereiniteit en anti-establishment" appeared on the left side when they should be on the right
+- The flip mechanism (`compute_flip_direction`) correctly negated party scores, but labels were tied to static pre-computed fields
+- Components 3-10 in `build_svd_components_tab` used Procrustes-aligned scores that were rotated by the component 1-2 alignment, making them meaningless
+
+## What Didn't Work
+The 2026-04-05 fix added static `left_pole`/`right_pole` fields to `SVD_THEMES`, pre-computed based on the static `flip` value in config. This failed because:
+
+1. `compute_flip_direction()` determines flip at **runtime** by comparing mean scores of canonical right vs left parties against actual voting data
+2. The static `flip` value in config could differ from the runtime result when voting patterns shift
+3. When runtime flip differed from the static config, the pre-computed `left_pole`/`right_pole` pointed to the wrong side
+
+### Root Cause Detail: Dynamic Flip Override
+
+The bug was compounded by `explorer.py` lines 2636-2649, where `compute_flip_direction()` dynamically overwrites `SVD_THEMES[comp]["flip"]` for **all** components (1-10) at runtime:
+
+```python
+# explorer.py lines 2677-2690
+for comp in range(1, 11):
+    flip = compute_flip_direction(comp, party_scores)
+    if comp in SVD_THEMES:
+        SVD_THEMES[comp]["flip"] = flip
+```
+
+When PVV/FVD had negative scores on component 2:
+1. `compute_flip_direction(2, party_scores)` returned `True` (right parties have lower mean)
+2. `SVD_THEMES[2]["flip"]` was overwritten from `False` to `True`
+3. With `flip=True`, scores were negated (PVV/FVD became positive → appeared on RIGHT)
+4. But the **label derivation logic** (`explorer.py` lines 954-957, 1073-1077) was backwards:
+   ```python
+   left_label = theme.get("left_pole", pos_pole if flip else neg_pole)
+   right_label = theme.get("right_pole", neg_pole if flip else pos_pole)
+   ```
+   When `flip=True`, `left_label` was set to `pos_pole` (which described PVV/FVD), but PVV/FVD were now on the **RIGHT** side after negation.
+
+This meant labels were misaligned with the actual data whenever the runtime flip differed from the static config flip.
+
+## Solution
+
+### Bug 1: Label derivation
+
+Removed static `left_pole`/`right_pole` from all 10 `SVD_THEMES` entries in `analysis/config.py`. Labels are now always derived at render time from `positive_pole`/`negative_pole` and the runtime flip direction:
+
+```python
+# analysis/svd_labels.py — derive left/right from runtime flip
+if flip:
+    left_pole, right_pole = pos_pole, neg_pole  # flip=True: positive on left
+else:
+    left_pole, right_pole = neg_pole, pos_pole  # flip=False: negative on left
+```
+
+The key insight: **`negative_pole` always describes what's on the LEFT, `positive_pole` always describes what's on the RIGHT** — regardless of flip. The flip only affects which raw SVD direction maps to left vs right.
+
+### Bug 2: Score mismatch in tijdtraject view
+
+Changed components 3-10 in `build_svd_components_tab` from `load_party_scores_all_windows_aligned()` to `load_party_scores_all_windows()`:
+
+```python
+# explorer.py — components 3-10 use per-window scores (not Procrustes-aligned)
+party_scores_by_window = load_party_scores_all_windows(db_path, all_windows)
+```
+
+**Why:** Procrustes alignment rotates the full 50-dim vector space to align components 1-2 across windows, but this also transforms components 3-10, making their scores incomparable with the single-window view. Per-window flip computation already handles orientation alignment for components 3-10.
+
+### Bug 3: Config as canonical SVD_THEMES source
+
+Updated `analysis/svd_labels.py` to prefer `analysis.config` as the canonical source for `SVD_THEMES`, falling back to `explorer` only when config is unavailable. Config is intentionally lightweight and free of heavy runtime dependencies (duckdb, plotly).
+
+### Prevention: Tests added
+
+Added `tests/test_svd_axis_alignment.py` with 3 tests:
+- `test_right_wing_on_right_all_components`: Verifies canonical right parties appear on right for all 10 components
+- `test_label_derivation_matches_fallback`: Verifies label derivation logic
+- `test_config_no_deprecated_fields`: Asserts no `left_pole`/`right_pole` in config
+
+Run with: `.venv/bin/python -m pytest tests/test_svd_axis_alignment.py -v`
+
+## Why This Works
+The flip direction is determined by comparing canonical right vs left party average scores against actual voting data. The label derivation follows a simple rule: `negative_pole` = left, `positive_pole` = right. Since the flip operation moves the canonical right parties to the positive side, the labels always match.
+
+For components 3-10, per-window scores are computed independently with per-window flip, so they remain comparable with single-window views. Procrustes only needs to align components 1-2 (the political compass axes).
+
+## Prevention
+- Never add static `left_pole`/`right_pole` fields to `SVD_THEMES` — derive them at render time
+- Run `tests/test_svd_axis_alignment.py` after any SVD recomputation
+- Components 3-10 in tijdtraject view must use `load_party_scores_all_windows()`, not the aligned variant
+- The key invariant: `negative_pole` = LEFT, `positive_pole` = RIGHT — flip only determines which raw direction maps to which side
+
+## Related Files
+- `analysis/config.py` — SVD_THEMES (no `left_pole`/`right_pole`)
+- `analysis/svd_labels.py` — `_get_svd_themes()` preferring config source
+- `explorer.py` — label derivation in trajectory rendering, component 3-10 scoring fix
+- `tests/test_svd_axis_alignment.py` — new tests validating alignment
+- `scripts/validate_svd_themes.py` — validation hook (updated to not expect `left_pole`/`right_pole`)
--- a/docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md
+++ b/docs/solutions/workflow-issues/trajectories-diagnostic-false-alarm-2026-03-31.md
@ -0,0 +1,79 @@
+---
+title: "Trajectories diagnostic script produced false alarm due to mocked empty data"
+date: 2026-03-31
+module: explorer
+problem_type: workflow_issue
+component: explorer
+severity: medium
+symptoms:
+  - "Diagnostic JSON showed party_map_count: 0 for all scenarios"
+  - "Trajectories appeared broken based on diagnostic output"
+  - "Mindmodel anti-pattern flagged compute_party_coords party_map mismatch"
+root_cause: logic_error
+resolution_type: process_fix
+tags:
+  - trajectories
+  - diagnostics
+  - false-alarm
+  - mocking
+  - testing
+---
+
+# Trajectories Diagnostic Script Produced False Alarm
+
+## Problem
+
+The `scripts/diagnose_trajectories_cli.py` diagnostic script reported `party_map_count: 0` across all scenarios, suggesting that party trajectories in the Explorer were broken. This triggered an investigation, a mindmodel anti-pattern flag, and multiple design docs for fixing "missing trajectories."
+
+## Symptoms
+
+- `thoughts/shared/diagnostics/2026-03-31-trajectories-diagnostics.json` showed `party_map_count: 0` in every scenario
+- The mindmodel generation process flagged `explorer_helpers.py:compute_party_coords` as a top anti-pattern (party_map key/value mismatch hypothesis)
+- Multiple implementation plans were drafted to add fallback rendering, MP-level trajectories, and debug instrumentation
+
+## Root Cause
+
+The diagnostic script itself was the bug. It artificially passed `load_party_map_ret={}` (empty dict) in **all** scenarios, creating a false alarm that had no relation to production behavior.
+
+When tested with real data:
+- `party_map` has **1,036 entries** (not 0)
+- `select_trajectory_plot_data` returns `trace_count=6` with real data
+- Annual view shows CDA, D66, VVD traces; quarterly view shows 6 party traces
+- Trajectories work correctly — no production bug exists
+
+The anti-pattern detected (`compute_party_coords` party_map mismatch) was also a false alarm: `svd_vectors` entity_ids are ALL MP names, never party names (no `entity_type='party'` rows exist in the database).
+
+## What Didn't Work
+
+- Trusting the diagnostic script output without validating against real data
+- Investigating based on the JSON artifact alone
+- The `2026-03-31-trajectories-diagnostics.json` was created by a script that passes `load_party_map_ret={}` artificially
+
+## Solution
+
+**No production code changes needed** — trajectories work correctly. The fix is to the diagnostic script and the process:
+
+1. **Fix `scripts/diagnose_trajectories_cli.py`** to use real data paths (`data/motions.db`) and real `load_party_map` / `load_positions` calls instead of mocking everything to empty
+2. **Re-run the fixed diagnostic script** to produce a correct `trajectories-diagnostics.json` artifact
+3. **Remove the incorrect anti-pattern** from the mindmodel manifest (the party_map mismatch hypothesis does not apply since no party-level entity_ids exist in `svd_vectors`)
+
+## Prevention
+
+1. **Never trust a diagnostic that mocks data to empty without a real-data validation step**
+2. **Always compare diagnostic output against a known-good baseline** — run the same checks with real data before concluding there's a bug
+3. **Diagnostic scripts should use real data paths by default** — if mocking is needed for unit tests, keep it in tests, not in diagnostic CLI scripts
+4. **Verify DB state directly** before investigating based on intermediary artifacts:
+   ```python
+   # Quick sanity check
+   from explorer import load_positions, load_party_map
+   positions_by_window, _ = load_positions("data/motions.db", "annual")
+   party_map = load_party_map("data/motions.db")
+   assert len(party_map) > 0, "party_map is empty — investigate data pipeline"
+   ```
+5. **Add an integration test** that calls `select_trajectory_plot_data` with real DB data and asserts `trace_count > 0`
+
+## Related
+
+- `docs/solutions/best-practices/refactoring-streamlit-data-loading.md` — Data loading patterns for explorer
+- `thoughts/shared/plans/2026-03-31-debug-trajectories-not-showing.md` — Original debugging plan (based on false alarm)
+- `thoughts/shared/designs/2026-03-31-diagnose-no-plot-trajectories-design.md` — Original design doc (based on false alarm)
--- a/tests/test_svd_comp1_matches_compass.py
+++ b/tests/test_svd_comp1_matches_compass.py
@ -0,0 +1,111 @@
+"""Test that SVD component 1 scores match compass x positions for current_parliament.
+
+Bug: The compass filters current_parliament to active MPs only (still seated) at
+explorer.py line 1473. The SVD components tab previously did NOT do this filter,
+causing party means to differ (e.g. VVD: ~0.108 vs ~0.335).
+
+Fix: get_aligned_party_scores() now accepts an active_mps parameter and filters
+current_parliament when provided.
+"""
+
+import numpy as np
+
+
+def test_svd_comp1_matches_compass_for_current_parliament_with_active_filter():
+    """VVD comp1 from get_aligned_party_scores should match compass when active_mps provided.
+
+    REGRESSION TEST: Without the active_mps filter, VVD comp1 ≈ 0.108 (wrong).
+    With the active_mps filter, VVD comp1 ≈ 0.335 (matching compass).
+    """
+    from explorer import get_aligned_party_scores, load_active_mps, load_party_map
+    from analysis.political_axis import compute_2d_axes, compute_nd_axes
+    from analysis.explorer_data import get_uniform_dim_windows
+
+    db_path = "data/motions.db"
+    windows = get_uniform_dim_windows(db_path)
+    active_mps = load_active_mps(db_path)
+    party_map = load_party_map(db_path)
+
+    # --- Compass reference: VVD mean x position ---
+    # Compass uses compute_2d_axes + filters to active_mps for current_parliament
+    pos2d, _ = compute_2d_axes(db_path, window_ids=windows)
+    cp_pos2d_active = {
+        mp: xy
+        for mp, xy in pos2d.get("current_parliament", {}).items()
+        if mp in active_mps
+    }
+    vvd_mps_2d = [mp for mp in cp_pos2d_active if party_map.get(mp) == "VVD"]
+    compass_vvd_mean = float(np.mean([cp_pos2d_active[mp][0] for mp in vvd_mps_2d]))
+
+    # --- SVD tab: WITH active_mps filter ---
+    svd_with_filter = get_aligned_party_scores(
+        db_path, "current_parliament", active_mps
+    )
+    vvd_comp1_with = float(svd_with_filter["VVD"][0])
+
+    # These MUST match (within tolerance)
+    diff = abs(compass_vvd_mean - vvd_comp1_with)
+    assert diff < 0.001, (
+        f"VVD comp1 mismatch: compass={compass_vvd_mean:.6f}, "
+        f"SVD with filter={vvd_comp1_with:.6f}, diff={diff:.6f}"
+    )
+
+
+def test_without_active_filter_gives_wrong_mean():
+    """Without active_mps filter, get_aligned_party_scores gives wrong VVD mean.
+
+    This documents the original bug: without filtering, VVD comp1 ≈ 0.108
+    (average of all historical VVD MPs). With filter, VVD comp1 ≈ 0.335
+    (only currently-seated VVD MPs, matching compass).
+    """
+    from explorer import get_aligned_party_scores, load_active_mps
+    from analysis.political_axis import compute_nd_axes
+    from analysis.explorer_data import get_uniform_dim_windows
+
+    db_path = "data/motions.db"
+    windows = get_uniform_dim_windows(db_path)
+    active_mps = load_active_mps(db_path)
+
+    # Without filter (BUGGY)
+    svd_no_filter = get_aligned_party_scores(
+        db_path, "current_parliament", active_mps=None
+    )
+    vvd_no_filter = float(svd_no_filter["VVD"][0])
+
+    # With filter (CORRECT)
+    svd_with_filter = get_aligned_party_scores(
+        db_path, "current_parliament", active_mps=active_mps
+    )
+    vvd_with_filter = float(svd_with_filter["VVD"][0])
+
+    # The buggy value should be significantly lower than the correct one
+    # (historical MPs have lower scores, dragging the mean down)
+    diff = abs(vvd_no_filter - vvd_with_filter)
+    assert diff > 0.1, (
+        f"Expected large diff between unfiltered ({vvd_no_filter:.4f}) and "
+        f"filtered ({vvd_with_filter:.4f}), got diff={diff:.4f}"
+    )
+
+    # The correct value should be ~0.33 (matching compass)
+    assert 0.30 < vvd_with_filter < 0.40, (
+        f"Active-filtered VVD comp1 ({vvd_with_filter:.4f}) should be ~0.335"
+    )
+
+
+def test_historical_window_unchanged():
+    """Historical windows (e.g. '2025') should NOT be affected by active_mps filter."""
+    from explorer import get_aligned_party_scores
+
+    db_path = "data/motions.db"
+
+    svd_no_filter = get_aligned_party_scores(db_path, "2025", active_mps=None)
+    svd_with_filter = get_aligned_party_scores(db_path, "2025", active_mps={"fake_mp"})
+
+    # For historical windows, the output should be identical regardless of active_mps
+    # (the filter is only applied for current_parliament)
+    for party in svd_no_filter:
+        if party in svd_with_filter:
+            diff = abs(svd_no_filter[party][0] - svd_with_filter[party][0])
+            assert diff < 1e-6, (
+                f"Historical window changed with active_mps filter for {party}"
+            )