9.1 KiB
| date | topic | status |
|---|---|---|
| 2026-03-29 | Motion-Driven Axis Labeling for Political Compass | validated |
Motion-Driven Axis Labeling
Problem Statement
The current axis labeling in analysis/axis_classifier.py correlates per-party PCA
positions against static scores from data/party_ideologies.csv. This has three
failure modes:
- Mislabeling: When the dominant PCA axis is coalition/opposition rather than left-right, it gets labeled "Links-Rechts" anyway, making the compass look "rotated 90 degrees".
- Static reference: A fixed ideology CSV cannot reflect year-specific political dynamics (e.g., asylum being the main left-right issue in 2015 vs. housing in 2023).
- No explainability: Users cannot see why an axis got a particular label.
The fix is to derive labels from the actual motions that most strongly split parliament on each PCA axis in a given year, and to expose those motions to users.
Constraints
- Must not break existing 8 passing tests.
- Must remain DuckDB-only for data access (no new external files for primary path).
party_ideologies.csvandcoalition_membership.csvremain as fallbacks — not removed.- The labeling approximation (projecting motion vectors without full Procrustes alignment) is acceptable for v1. Proper alignment can be added later.
- Labels must still be deterministic given the same DB state.
Approach
Primary: For each window, load motion SVD vectors from the DB, project them onto the PCA axes, rank motions by projection score, apply a Dutch keyword classifier to the top motion titles, and derive a categorical label.
Fallback chain (unchanged from today):
- Keyword classifier on top motions → categorical label
- Coalition correlation (existing
_pearsonragainst coalition dummy) - Ideology CSV correlation (existing Pearson-r against
party_ideologies.csv) - "Stempatroon As N" (generic fallback)
Axis swap: After classification, if Y-axis is "Links-Rechts" and X-axis is not, swap them (both positions and all axis metadata), so that left-right is conventionally on the horizontal axis when present.
Architecture
Changes by file
analysis/political_axis.py (minimal)
- Add
axes["global_mean"] = M.mean(axis=0)before returning fromcompute_2d_axes. This letsclassify_axescenter motion vectors before projection without needing to re-access the stacked matrix.
analysis/axis_classifier.py (major)
New private helpers:
_load_motion_vectors(db_path, window_id)→dict[int, np.ndarray]- SELECT entity_id, vector FROM svd_vectors WHERE entity_type='motion' AND window_id=?
- Returns {motion_id: vector}. Returns {} on any DB error.
_project_motions(motion_vecs, x_axis, y_axis, global_mean)→dict[int, tuple[float, float]]- For each motion:
x = dot(vec - global_mean, x_axis),y = dot(vec - global_mean, y_axis) - Returns {motion_id: (x_score, y_score)}
- For each motion:
_top_motion_ids(projections, axis, n=5)→{'+': [ids], '-': [ids]}- Sorts by axis score, returns top n positive and n negative motion IDs
_fetch_motion_titles(db_path, motion_ids)→dict[int, tuple[str, str]]- SELECT id, title, date FROM motions WHERE id IN (...)
- Returns {id: (title, date_str)}
_classify_from_titles(titles)→str | None- Applies keyword dict against concatenated titles of top motions
- Returns category string or None if confidence below threshold (0.4)
New module-level constant:
_KEYWORDS: dict[str, list[str]]— Dutch keyword → category mapping (see below)
Modified classify_axes:
- Check if
axescontainsglobal_mean; if not, skip motion classification. - For each window W: a. Load motion vectors b. Project onto x_axis, y_axis using global_mean c. Find top 5+5 motions per axis d. Fetch titles from motions table e. Apply keyword classifier → label candidate f. If None: fall through to existing Pearson-r approaches
- Store
x_top_motionsandy_top_motionsper window in enriched dict - Store
x_label_confidenceandy_label_confidenceper window
explorer.py (two changes)
-
Axis swap in
load_positions, afterclassify_axesreturns:if axis_def.get("y_label") == "Links–Rechts" and axis_def.get("x_label") != "Links–Rechts": positions_by_window, axis_def = _swap_axes(positions_by_window, axis_def)_swap_axestransposes (x, y) in every entity position and swaps all x_/y_ keys in axis_def. -
Motion expander in
build_compass_tab, belowst.plotly_chart:with st.expander("🔍 Wat bepaalt deze assen?"): # show top 3 +/- motions for x and y, with date # show confidence and explained variance for this window
Data Flow
compute_2d_axes(db_path, windows)
→ (positions_by_window, axes) # axes now contains global_mean
classify_axes(positions_by_window, axes, db_path)
→ axis_def # now contains x/y_top_motions, confidence
load_positions (in explorer.py)
→ swap axes if y_label == "Links–Rechts"
→ return (positions_by_window, axis_def)
build_compass_tab
→ scatter chart (uses x_label, y_label — already wired)
→ expander (uses x_top_motions, y_top_motions)
Keyword Dictionary
Categories and representative terms (non-exhaustive; full dict in implementation):
Links-Rechts
- Economic:
belasting,uitkering,bijstand,minimumloon,cao,vakbond,bezuiniging,privatisering,subsidie,zorg,pensioen,AOW - Immigration:
asiel,asielaanvraag,migratie,vreemdeling,vluchtelingen,terugkeer,grenzen,opvang,statushouder
Progressief-Conservatief
- Environment:
klimaat,stikstof,duurzaam,duurzaamheid,co2,energietransitie,biodiversiteit - Social:
euthanasie,abortus,lgbtq,transgender,diversiteit,traditi,gezin,religie,geloof
Coalitie-Oppositie (detected via coalition correlation, not keywords — keyword detection for this category is unreliable)
Nationaal-Internationaal (optional, lower priority)
navo,nato,europees,europese,eu,verdrag,vn,internationaal
Matching: case-insensitive substring match on lowercased title. Score = fraction of top-10 motions containing at least one keyword from the winning category. Threshold for acceptance = 0.4 (i.e., at least 4 out of 10 top motions match).
New axis_def Fields
x_top_motions: {window_id: {'+': [(title, date), ...], '-': [(title, date), ...]}}
y_top_motions: same structure
x_label_confidence: {window_id: float} # 0.0–1.0
y_label_confidence: {window_id: float}
global_mean: np.ndarray # stored in axes dict, not surfaced to UI
Existing fields (x_label, y_label, x_quality, y_quality, x_interpretation,
y_interpretation) are preserved.
UI Display (Option C)
Axis titles: unchanged — already uses axis_def.get("x_label").
New expander (collapsed by default) below compass scatter:
🔍 Wat bepaalt deze assen?
Horizontale as: Links–Rechts (vertrouwen: 70%)
Rechtspool: Motie over asielbeleid (2023-11-14) · Motie over belastingverlaging (2023-10-05) ...
Linkspool: Motie over uitkeringen (2023-11-20) · Motie over minimumloon (2023-09-12) ...
Verticale as: Progressief–Conservatief (vertrouwen: 55%)
Progressief: Motie over klimaatdoelen (2023-12-01) ...
Conservatief: Motie over tradities (2023-10-18) ...
As 1 verklaart 11% van de variantie in stemgedrag.
Error Handling
| Situation | Behavior |
|---|---|
| No motion vectors for window | Skip motion classification; fall through to ideology CSV |
| Motion title fetch fails | Use motion IDs as placeholder; label falls back |
| Keyword confidence below threshold | Fall through to coalition correlation |
| Both motion and CSV classification fail | "Stempatroon As N" (existing) |
global_mean missing from axes |
Skip motion projection entirely |
Testing Strategy
New unit tests (in tests/test_political_compass.py):
test_classify_from_titles_left_right— mock titles withasiel/belasting→ expect "Links–Rechts"test_classify_from_titles_progressive— mock titles withklimaat/stikstof→ expect "Progressief–Conservatief"test_classify_from_titles_low_confidence— mixed keywords → expect None (fallback triggered)test_axis_swap_when_y_is_left_right— positions (x,y) → (y,x), labels swappedtest_axis_swap_not_applied_when_x_is_left_right— no swap when already correct
All 8 existing tests must continue to pass.
Out of Scope
Explained variance drop (18% → 11%): Observed but not addressed here. Likely reflects genuine fragmentation of the Schoof parliament (4 smaller coalition parties). Warrants a separate diagnostic session. The expander now surfaces the explained variance, making this visible to users.
Proper Procrustes alignment of motion vectors: The projection approximation (ignoring per-window rotation) is acceptable for v1. If label instability is observed across windows, add rotation application as a follow-up.
Removing party_ideologies.csv: Kept as fallback. Can be removed once motion
classification has proven reliable over several parliament periods.