- analysis/explorer_data.py: add AND window_id NOT LIKE '%-Q%' to
_UNIFORM_DIM_SQL so quarterly windows are filtered at the source
- explorer.py: remove stale comment justifying quarterly inclusion;
remove redundant '-Q' guard in SVD tab trajectory view
- scripts/recompute_svd.py: replace quarter_bounds() with year_bounds()
that handles annual window IDs like '2024'; filter window list to
annual-only before recomputing SVD
Revise SVD_THEMES labels based on TF-IDF analysis of top 50 motions
per component (pool size: current_parliament). Manual review of motion
titles ensures labels reflect actual parliamentary content rather than
party position semantics.
Key corrections:
- Axis 1: fiscal/economic policy vs social welfare + international rights
- Axis 4: active international engagement vs restraint
- Axis 5: pragmatic financial support vs progressive individual rights
- Axis 6: fossil fuels/financial incentives vs climate/intl rights
- Axis 7: practical-administrative vs idealistico-procedural (kept)
- Axis 8: European defense cooperation vs domestic socioeconomic policy
- Axis 9: concrete-administrative vs systemic reform
- Axis 10: citizen protection vs government regulation
Subagent analysis caught that axes 5 and 6 are NOT the same
(Nationale soevereiniteit) — manual motion review confirms distinct
content for each. Axes 1, 5, 6 had completely wrong labels.
Refs: thoughts/explorer/svd_label_review.md
See also: docs/brainstorms/2026-04-13-topic-derived-svd-labels-requirements.md
- Added --pool-size argument (default 50) to control pool size
- Pool mode is now default; use --no-exclusive for old behavior
- Algorithm: for each component, claim top 5 positive + 5 negative from pool
- All 10 SVD components now have exactly 10 representative motions
Also removes tests that require missing dependencies (sklearn, plotly) or
missing files (.mindmodel/manifest.yaml):
- tests/mindmodel/ (2 files)
- tests/test_diagnose_no_plot_trajectories.py
- tests/test_explorer_chart.py
- tests/test_motion_drift.py
- tests/test_trajectories_pipeline_integration.py
- tests/test_trajectory_*.py (4 files)
Refs: thoughts/shared/plans/2026-04-12-svd-axis-label-alignment.md
Previously, components 1-2 in the SVD tab used Procrustes-aligned PCA
coordinates (from load_positions), which meant the SVD tab showed PCA
dimensions of the 50D aligned space rather than the actual raw SVD
components. This was a fundamental inconsistency — the SVD tab's component 2
showed completely different party ordering than the raw SVD component 2.
Changes:
- explorer.py: Unified all components 1-10 to use raw SVD values via
load_party_axis_scores_for_window(). Removed the separate
load_positions() path for components 1-2. Now all components use the
same data source (50D vectors from svd_vectors table).
- explorer.py: Updated flip computation to cover ALL components 1-10
(was range 3-11 for components 3-10 only). The compute_flip_direction
function correctly determines sign for each component.
- explorer.py: Unified rendering to always use _render_party_axis_chart_1d
(was _render_party_axis_chart for components 1-2 using 2D coords).
- explorer.py: Unified trajectory to always use load_party_scores_all_windows.
- analysis/config.py: Updated component 1 label (simplified explanation,
removed coalition-specific policy references).
- analysis/config.py: Updated component 2 label to "Nationalistisch versus
kosmopolitisch" matching raw SVD data (PVV/FVD at positive extreme,
Volt/DENK/GL-PvdA at negative extreme).
- tests: Updated test assertions to match new labels.
- scripts/validate_svd_themes.py: Verified all components pass right-wing
alignment check, config flip consistency, and theme pole consistency.
Fixes the core inconsistency: SVD tab component 2 now uses the same raw
SVD data as components 3-10, with consistent party ordering and labels.
The compass remains a separate PCA-based visualization.
Two related bugs fixed:
1. Label alignment: Removed static left_pole/right_pole from SVD_THEMES
entries. These labels assumed a fixed flip direction but could mismatch
with runtime flip computation, causing right-wing parties to appear on
the wrong side. Labels are now always derived from positive_pole,
negative_pole, and the runtime flip direction.
2. Score mismatch: Changed tijdtraject view for components 3-10 from
load_party_scores_all_windows_aligned() to load_party_scores_all_windows().
Procrustes alignment rotates the full 50-dim vector space to align
components 1-2, but this also transforms components 3-10, making their
scores incomparable with the single-window view. Per-window flip
computation already handles orientation alignment for these components.
Also updated svd_labels.py to prefer analysis.config as the canonical
source for SVD_THEMES, falling back to explorer only when config is
unavailable.
- Add script to find motions closest to semantic gravity per axis/window
- Document Axis 1 semantic shift: from administrative law (2016)
to migration/asylum policy (2026)
- Shows that 'coalition' votes on different topics over time
- Implement SVD axis stability using Lasso regression on fused embeddings
- Add overtone shift analysis to detect semantic content changes
- Implement semantic drift tracking for motion content over time
- Add party voting analysis with cross-ideological voting patterns
- Generate markdown report with visualizations
- Add comprehensive test suite with 12 passing tests
See reports/drift/report.md for analysis results.
- Add compute_overtone_shift(): tracks semantic gravity movement across windows
even when party ordering stays the same
- Update _generate_report() with overtone shift section including dimension-level
analysis and inflection point detection
- Update methodology section to reflect new metrics
- All 12 tests pass
Key finding: no axes exceed 0.7 stability threshold — semantic features
defining each SVD axis shift significantly across windows (0.06-0.51 range)
- Replace Procrustes-based stability with Ridge regression on fused embeddings
- For each SVD axis, fit Ridge: SVD_score ~ fused_embedding per window
- Compare weight vectors via max(cosine similarity, Jaccard top-100)
- Add --regression-alpha CLI argument (default 1.0)
- Keep party-based fallback for windows with < 50 motions
- Update tests for new regression-based approach
Key finding: regression weights show moderate stability (0.06-0.51)
but no axes exceed 0.7 threshold — semantic features defining each
axis shift significantly across windows
- Add scripts/motion_drift.py: analyzes SVD axis stability, semantic drift,
and cross-ideological voting patterns across annual windows
- Add analysis/motion_drift.py: core analysis functions with Procrustes
alignment fallback using party-based sign consistency
- Add matplotlib dependency for static chart generation
- Add tests/test_motion_drift.py: 12 tests covering all analysis functions
- Report output: markdown with embedded PNG charts
Key findings from real data:
- No axes are fully stable (>0.7) across 2019-2026
- All axes show moderate consistency (0.40-0.47) — stable within periods
but flip between cabinet periods (2019/2022/2026 vs 2023/2024/2025)
- Party voting analysis detects cross-ideological voting patterns
Bug: report_per_component used scored[:args.report_top_n] which took
top N by score (all positive for components with only positive scores).
JSON correctly separated positive and negative poles.
Fix: Use same positive/negative separation logic for report as JSON.
- Each motion now assigned to exactly one component (highest absolute score)
- Added --exclusive flag (default: True) for backward compatibility
- Added markdown report generation with motion details for label review
- Added --report-top-n for report size (default: 20 per component)
- Updated JSON output with 'exclusive' flag for transparency
Use one DuckDB write connection for the entire update loop instead of
opening/closing per row, wrapped in try/finally for proper cleanup.
Move 'import duckdb' to module level with other imports.
Enable backfilling body_text for existing motions that lack it (2016-2018 data).
New extract_besluit_id() and update_existing_motions() helpers support the
--update-existing mode, while --no-skip-details enables detail fetching during
normal downloads. Includes 7 tests covering URL parsing, DB update flow, and
argparse wiring.
Removes the raw_title[:80] cap on expander labels so full titles show.
Adds scripts/generate_svd_json.py to regenerate top_svd_top_motions.json
from any SVD window after a recompute.
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md