Previous query used DISTINCT ON without ordering by dim, picking arbitrary
(often non-50) dim per window. Rewritten to find the dominant dim per window
(highest count) and include only windows where dominant dim = 50 with >= 10
entities. This surfaces annual windows 2016/2018/2019/2022-2026 that were
previously excluded due to mixed-dim rows from multiple pipeline runs.
- load_positions annual mode now selects actual annual window_ids
('2022', '2023', etc.) instead of Q4 quarterly approximations,
with current_parliament appended as the most-recent anchor
- Sidebar radio defaults to 'Per jaar' (annual) instead of quarterly
- Dutch labels for window size radio: 'Per jaar' / 'Per kwartaal'
- Add import re (needed for strip_paren deduplication)
- Deduplicate MPs with parenthetical first-name variants (e.g. 'Dijk, J.P.'
and 'Dijk, J.P. (Jimmy)') by stripping parens and averaging positions;
fixes 22 duplicate groups across all windows
- Replace select_slider with selectbox for time window control
- Remove non-functional 'Toon namen' checkbox and its usage
- Remove 'Min. MPs per partij' slider and party-size filter
- Add 'Weergave' radio toggle: Kamerleden (individual MPs) vs Partijen
(party centroids computed as mean x/y per party)
- Fix axis ranges: x [-1, 1], y [-0.6, 0.6]
- Party view shows party abbreviation as dot label and hover shows MP count
- Use all individual MPs (not party aggregates) for L2-norm computation;
party-aggregated vectors have near-zero values on some dims due to
Procrustes alignment, producing spurious zeros
- Sort importances descending so scree plot is properly monotonic
- Relabel x-axis as 'Rang' since dim ordering after Procrustes alignment
no longer matches original singular value order
- Add Scatter line trace connecting bar tops for elbow visibility
- Add load_scree_data() cached loader computing L2-norm of party scores
per SVD dimension as a proxy for component importance
- Add _render_scree_plot() rendering a bar chart of the first 15 components
- Insert scree plot + Dutch explanation at the top of build_svd_components_tab
- Clean up _render_party_axis_chart: remove tick numbers, axis line, grid,
and zero-line from the x-axis (pole labels remain as chart title)
Cleanup performed by assistant: removed generated caches and stale files: __pycache__, *.pyc, .pytest_cache, .ruff_cache, dummy/, test.py, read.py, reset.py, fix_database.py, thoughts/thoughts/, .github/workflows/mindmodel-validate.yml. No push performed.
Adds new SVD window 'current_parliament' covering 8732 motions and 451 MPs
(vs 7424 motions in old '2025' window, adding ~1300 motions from 2025-Q4+).
Updates explorer.py to query the new window. Regenerates top_svd_top_motions.json.
Also clarifies axis 3 explanation noting FVD's anti-American positioning.
Removes the raw_title[:80] cap on expander labels so full titles show.
Adds scripts/generate_svd_json.py to regenerate top_svd_top_motions.json
from any SVD window after a recompute.
Axes 4 and 5 had inverted sign conventions relative to actual party votes.
Diagnostic confirmed SP/PvdD scored negative on axis 4 (free trade motions)
and FVD scored negative on axis 5 (secular motions), opposite to their
voting behaviour. Fix: swap positive_pole/negative_pole for both axes and
set correct flip direction so progressive parties appear on the left.
- Add ChristenUnie colour alias and CURRENT_PARLIAMENT_PARTIES frozenset (15 parties)
- Add load_party_axis_scores() — queries party SVD vectors from window=2025, cached
- Add _render_party_axis_chart() — 1D Plotly scatter of party positions per axis
- Restructure build_svd_components_tab: replace session-state button/detail-pane with
inline st.expander per motion, split into pos/neg pole columns, batch DB query for
all 10 motions including voting_results, rendered via _render_voting_results
Smoke-tested: 15 parties loaded, all 10 axis-1 motions returned with voting data.
Replace draft SVD_THEMES with themes produced by per-axis analysis of all
10 unique top motions (zero cross-axis overlap, window=2025). Each axis now
has a detailed Dutch-language explanation, positive_pole and negative_pole
labels, displayed as colour-coded columns in the UI.
Deduplication:
- Identified 18 motion pairs with identical body_text and externe_identifier
- Kept the lower ID (first inserted) from each pair
- Cascaded deletes: 18 motions, 18 embeddings, 28 svd_vectors, 23 fused_embeddings
- motions table: 28172 → 28154, zero body_text duplicate groups remaining
SVD analysis:
- Regenerated top_svd_top_motions.json for window=2025 with clean data
(7424 vectors, down from 7430)
- 100 unique motions across 10 axes, no title or ID duplicates
- De Vos huiseigenaren motie no longer appears twice in axis 3
- Regenerated top_svd_top_motions.json for window=2025 with strict
cross-axis deduplication: 100 unique motions across 10 axes (10 per
axis, zero overlap), sorted by absolute SVD score
- Added SVD_THEMES dict to build_svd_components_tab with Dutch-language
theme label and political-polarisation explanation for each of the 10
axes (e.g. 'Confessioneel-conservatief vs. seculier-progressief')
- Selectbox now shows 'As N — <theme>' instead of bare component number
- Each selected axis shows an info banner with the full explanation
- Motion list buttons show ▲/▼ to indicate positive/negative SVD loading
- Translated UI strings to Dutch for consistency
Root causes:
- Seed selection sorted by controversy_score across all 28k motions, but
only 282 have individual MP vote records. Top controversial motions only
have party-level votes, so match_mps_for_votes always returned empty.
- global_db singleton was used for match/discriminate instead of the db_path
passed to the tab builder.
Fixes:
- Add MotionDatabase.get_motions_with_individual_votes(k) which queries
motions with comma-formatted mp_name votes, ordered by controversy_score
- Replace broken seed logic in build_mp_quiz_tab with this new method
- Replace global_db usages with a local MotionDatabase(db_path) instance
- Guard against motion IDs present in votes but absent from motions DataFrame