- Add compute_overtone_shift(): tracks semantic gravity movement across windows
even when party ordering stays the same
- Update _generate_report() with overtone shift section including dimension-level
analysis and inflection point detection
- Update methodology section to reflect new metrics
- All 12 tests pass
Key finding: no axes exceed 0.7 stability threshold — semantic features
defining each SVD axis shift significantly across windows (0.06-0.51 range)
- Replace Procrustes-based stability with Ridge regression on fused embeddings
- For each SVD axis, fit Ridge: SVD_score ~ fused_embedding per window
- Compare weight vectors via max(cosine similarity, Jaccard top-100)
- Add --regression-alpha CLI argument (default 1.0)
- Keep party-based fallback for windows with < 50 motions
- Update tests for new regression-based approach
Key finding: regression weights show moderate stability (0.06-0.51)
but no axes exceed 0.7 threshold — semantic features defining each
axis shift significantly across windows
- Add scripts/motion_drift.py: analyzes SVD axis stability, semantic drift,
and cross-ideological voting patterns across annual windows
- Add analysis/motion_drift.py: core analysis functions with Procrustes
alignment fallback using party-based sign consistency
- Add matplotlib dependency for static chart generation
- Add tests/test_motion_drift.py: 12 tests covering all analysis functions
- Report output: markdown with embedded PNG charts
Key findings from real data:
- No axes are fully stable (>0.7) across 2019-2026
- All axes show moderate consistency (0.40-0.47) — stable within periods
but flip between cabinet periods (2019/2022/2026 vs 2023/2024/2025)
- Party voting analysis detects cross-ideological voting patterns
Bug: report_per_component used scored[:args.report_top_n] which took
top N by score (all positive for components with only positive scores).
JSON correctly separated positive and negative poles.
Fix: Use same positive/negative separation logic for report as JSON.
- Each motion now assigned to exactly one component (highest absolute score)
- Added --exclusive flag (default: True) for backward compatibility
- Added markdown report generation with motion details for label review
- Added --report-top-n for report size (default: 20 per component)
- Updated JSON output with 'exclusive' flag for transparency
Use one DuckDB write connection for the entire update loop instead of
opening/closing per row, wrapped in try/finally for proper cleanup.
Move 'import duckdb' to module level with other imports.
Enable backfilling body_text for existing motions that lack it (2016-2018 data).
New extract_besluit_id() and update_existing_motions() helpers support the
--update-existing mode, while --no-skip-details enables detail fetching during
normal downloads. Includes 7 tests covering URL parsing, DB update flow, and
argparse wiring.
Removes the raw_title[:80] cap on expander labels so full titles show.
Adds scripts/generate_svd_json.py to regenerate top_svd_top_motions.json
from any SVD window after a recompute.
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md