Root causes:
- Seed selection sorted by controversy_score across all 28k motions, but
only 282 have individual MP vote records. Top controversial motions only
have party-level votes, so match_mps_for_votes always returned empty.
- global_db singleton was used for match/discriminate instead of the db_path
passed to the tab builder.
Fixes:
- Add MotionDatabase.get_motions_with_individual_votes(k) which queries
motions with comma-formatted mp_name votes, ordered by controversy_score
- Replace broken seed logic in build_mp_quiz_tab with this new method
- Replace global_db usages with a local MotionDatabase(db_path) instance
- Guard against motion IDs present in votes but absent from motions DataFrame
- fetch_mp_metadata: use real OData URL with pagination (1200 records, 5 pages)
uses Fractie.Afkorting not NaamNL for abbreviation matching
skips Verwijderd=true records
- upsert_mp_metadata: keep most recent membership (prefer active over ended,
then higher Van date) so current party affiliations are not overwritten by historical
- compute_anchor_axis: anchor directly on party-level SVD entities (GroenLinks-PvdA etc)
before falling back to mp_metadata individual MP lookup
- test_fetch_mp_metadata: fix mock for timeout kwarg + pagination + Afkorting field
- Generated anchor axis HTML for 2025-Q2 through 2026-Q1 in outputs/
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md