9.1 KiB

Raw Blame History

date	topic	status
2026-03-29	Motion-Driven Axis Labeling for Political Compass	validated

Motion-Driven Axis Labeling

Problem Statement

The current axis labeling in analysis/axis_classifier.py correlates per-party PCA positions against static scores from data/party_ideologies.csv. This has three failure modes:

Mislabeling: When the dominant PCA axis is coalition/opposition rather than left-right, it gets labeled "Links-Rechts" anyway, making the compass look "rotated 90 degrees".
Static reference: A fixed ideology CSV cannot reflect year-specific political dynamics (e.g., asylum being the main left-right issue in 2015 vs. housing in 2023).
No explainability: Users cannot see why an axis got a particular label.

The fix is to derive labels from the actual motions that most strongly split parliament on each PCA axis in a given year, and to expose those motions to users.

Constraints

Must not break existing 8 passing tests.
Must remain DuckDB-only for data access (no new external files for primary path).
party_ideologies.csv and coalition_membership.csv remain as fallbacks — not removed.
The labeling approximation (projecting motion vectors without full Procrustes alignment) is acceptable for v1. Proper alignment can be added later.
Labels must still be deterministic given the same DB state.

Approach

Primary: For each window, load motion SVD vectors from the DB, project them onto the PCA axes, rank motions by projection score, apply a Dutch keyword classifier to the top motion titles, and derive a categorical label.

Fallback chain (unchanged from today):

Keyword classifier on top motions → categorical label
Coalition correlation (existing _pearsonr against coalition dummy)
Ideology CSV correlation (existing Pearson-r against party_ideologies.csv)
"Stempatroon As N" (generic fallback)

Axis swap: After classification, if Y-axis is "Links-Rechts" and X-axis is not, swap them (both positions and all axis metadata), so that left-right is conventionally on the horizontal axis when present.

Architecture

Changes by file

`analysis/political_axis.py` (minimal)

Add axes["global_mean"] = M.mean(axis=0) before returning from compute_2d_axes. This lets classify_axes center motion vectors before projection without needing to re-access the stacked matrix.

`analysis/axis_classifier.py` (major)

New private helpers:

_load_motion_vectors(db_path, window_id) → dict[int, np.ndarray]
- SELECT entity_id, vector FROM svd_vectors WHERE entity_type='motion' AND window_id=?
- Returns {motion_id: vector}. Returns {} on any DB error.
_project_motions(motion_vecs, x_axis, y_axis, global_mean) → dict[int, tuple[float, float]]
- For each motion: x = dot(vec - global_mean, x_axis), y = dot(vec - global_mean, y_axis)
- Returns {motion_id: (x_score, y_score)}
_top_motion_ids(projections, axis, n=5) → {'+': [ids], '-': [ids]}
- Sorts by axis score, returns top n positive and n negative motion IDs
_fetch_motion_titles(db_path, motion_ids) → dict[int, tuple[str, str]]
- SELECT id, title, date FROM motions WHERE id IN (...)
- Returns {id: (title, date_str)}
_classify_from_titles(titles) → str | None
- Applies keyword dict against concatenated titles of top motions
- Returns category string or None if confidence below threshold (0.4)

New module-level constant:

_KEYWORDS: dict[str, list[str]] — Dutch keyword → category mapping (see below)

Modified classify_axes:

Check if axes contains global_mean; if not, skip motion classification.
For each window W: a. Load motion vectors b. Project onto x_axis, y_axis using global_mean c. Find top 5+5 motions per axis d. Fetch titles from motions table e. Apply keyword classifier → label candidate f. If None: fall through to existing Pearson-r approaches
Store x_top_motions and y_top_motions per window in enriched dict
Store x_label_confidence and y_label_confidence per window

`explorer.py` (two changes)

Axis swap in load_positions, after classify_axes returns:

if axis_def.get("y_label") == "Links–Rechts" and axis_def.get("x_label") != "Links–Rechts":
    positions_by_window, axis_def = _swap_axes(positions_by_window, axis_def)

_swap_axes transposes (x, y) in every entity position and swaps all x_/y_ keys in axis_def.

Motion expander in build_compass_tab, below st.plotly_chart:

with st.expander("🔍 Wat bepaalt deze assen?"):
    # show top 3 +/- motions for x and y, with date
    # show confidence and explained variance for this window

Data Flow

compute_2d_axes(db_path, windows)
  → (positions_by_window, axes)        # axes now contains global_mean

classify_axes(positions_by_window, axes, db_path)
  → axis_def                            # now contains x/y_top_motions, confidence

load_positions (in explorer.py)
  → swap axes if y_label == "Links–Rechts"
  → return (positions_by_window, axis_def)

build_compass_tab
  → scatter chart (uses x_label, y_label — already wired)
  → expander (uses x_top_motions, y_top_motions)

Keyword Dictionary

Categories and representative terms (non-exhaustive; full dict in implementation):

Links-Rechts

Economic: belasting, uitkering, bijstand, minimumloon, cao, vakbond, bezuiniging, privatisering, subsidie, zorg, pensioen, AOW
Immigration: asiel, asielaanvraag, migratie, vreemdeling, vluchtelingen, terugkeer, grenzen, opvang, statushouder

Progressief-Conservatief

Environment: klimaat, stikstof, duurzaam, duurzaamheid, co2, energietransitie, biodiversiteit
Social: euthanasie, abortus, lgbtq, transgender, diversiteit, traditi, gezin, religie, geloof

Coalitie-Oppositie (detected via coalition correlation, not keywords — keyword detection for this category is unreliable)

Nationaal-Internationaal (optional, lower priority)

navo, nato, europees, europese, eu, verdrag, vn, internationaal

Matching: case-insensitive substring match on lowercased title. Score = fraction of top-10 motions containing at least one keyword from the winning category. Threshold for acceptance = 0.4 (i.e., at least 4 out of 10 top motions match).

New `axis_def` Fields

x_top_motions:    {window_id: {'+': [(title, date), ...], '-': [(title, date), ...]}}
y_top_motions:    same structure
x_label_confidence: {window_id: float}   # 0.0–1.0
y_label_confidence: {window_id: float}
global_mean:      np.ndarray             # stored in axes dict, not surfaced to UI

Existing fields (x_label, y_label, x_quality, y_quality, x_interpretation, y_interpretation) are preserved.

UI Display (Option C)

Axis titles: unchanged — already uses axis_def.get("x_label").

New expander (collapsed by default) below compass scatter:

🔍 Wat bepaalt deze assen?

Horizontale as: Links–Rechts  (vertrouwen: 70%)
  Rechtspool: Motie over asielbeleid (2023-11-14) · Motie over belastingverlaging (2023-10-05) ...
  Linkspool:  Motie over uitkeringen (2023-11-20) · Motie over minimumloon (2023-09-12) ...

Verticale as: Progressief–Conservatief  (vertrouwen: 55%)
  Progressief: Motie over klimaatdoelen (2023-12-01) ...
  Conservatief: Motie over tradities (2023-10-18) ...

As 1 verklaart 11% van de variantie in stemgedrag.

Error Handling

Situation	Behavior
No motion vectors for window	Skip motion classification; fall through to ideology CSV
Motion title fetch fails	Use motion IDs as placeholder; label falls back
Keyword confidence below threshold	Fall through to coalition correlation
Both motion and CSV classification fail	"Stempatroon As N" (existing)
`global_mean` missing from axes	Skip motion projection entirely

Testing Strategy

New unit tests (in tests/test_political_compass.py):

test_classify_from_titles_left_right — mock titles with asiel/belasting → expect "Links–Rechts"
test_classify_from_titles_progressive — mock titles with klimaat/stikstof → expect "Progressief–Conservatief"
test_classify_from_titles_low_confidence — mixed keywords → expect None (fallback triggered)
test_axis_swap_when_y_is_left_right — positions (x,y) → (y,x), labels swapped
test_axis_swap_not_applied_when_x_is_left_right — no swap when already correct

All 8 existing tests must continue to pass.

Out of Scope

Explained variance drop (18% → 11%): Observed but not addressed here. Likely reflects genuine fragmentation of the Schoof parliament (4 smaller coalition parties). Warrants a separate diagnostic session. The expander now surfaces the explained variance, making this visible to users.

Proper Procrustes alignment of motion vectors: The projection approximation (ignoring per-window rotation) is acceptable for v1. If label instability is observed across windows, add rotation application as a follow-up.

Removing party_ideologies.csv: Kept as fallback. Can be removed once motion classification has proven reliable over several parliament periods.

9.1 KiB Raw Blame History