You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/docs/superpowers/specs/2026-03-29-motion-driven-ax...

9.1 KiB

date topic status
2026-03-29 Motion-Driven Axis Labeling for Political Compass validated

Motion-Driven Axis Labeling

Problem Statement

The current axis labeling in analysis/axis_classifier.py correlates per-party PCA positions against static scores from data/party_ideologies.csv. This has three failure modes:

  1. Mislabeling: When the dominant PCA axis is coalition/opposition rather than left-right, it gets labeled "Links-Rechts" anyway, making the compass look "rotated 90 degrees".
  2. Static reference: A fixed ideology CSV cannot reflect year-specific political dynamics (e.g., asylum being the main left-right issue in 2015 vs. housing in 2023).
  3. No explainability: Users cannot see why an axis got a particular label.

The fix is to derive labels from the actual motions that most strongly split parliament on each PCA axis in a given year, and to expose those motions to users.

Constraints

  • Must not break existing 8 passing tests.
  • Must remain DuckDB-only for data access (no new external files for primary path).
  • party_ideologies.csv and coalition_membership.csv remain as fallbacks — not removed.
  • The labeling approximation (projecting motion vectors without full Procrustes alignment) is acceptable for v1. Proper alignment can be added later.
  • Labels must still be deterministic given the same DB state.

Approach

Primary: For each window, load motion SVD vectors from the DB, project them onto the PCA axes, rank motions by projection score, apply a Dutch keyword classifier to the top motion titles, and derive a categorical label.

Fallback chain (unchanged from today):

  1. Keyword classifier on top motions → categorical label
  2. Coalition correlation (existing _pearsonr against coalition dummy)
  3. Ideology CSV correlation (existing Pearson-r against party_ideologies.csv)
  4. "Stempatroon As N" (generic fallback)

Axis swap: After classification, if Y-axis is "Links-Rechts" and X-axis is not, swap them (both positions and all axis metadata), so that left-right is conventionally on the horizontal axis when present.

Architecture

Changes by file

analysis/political_axis.py (minimal)

  • Add axes["global_mean"] = M.mean(axis=0) before returning from compute_2d_axes. This lets classify_axes center motion vectors before projection without needing to re-access the stacked matrix.

analysis/axis_classifier.py (major)

New private helpers:

  • _load_motion_vectors(db_path, window_id)dict[int, np.ndarray]
    • SELECT entity_id, vector FROM svd_vectors WHERE entity_type='motion' AND window_id=?
    • Returns {motion_id: vector}. Returns {} on any DB error.
  • _project_motions(motion_vecs, x_axis, y_axis, global_mean)dict[int, tuple[float, float]]
    • For each motion: x = dot(vec - global_mean, x_axis), y = dot(vec - global_mean, y_axis)
    • Returns {motion_id: (x_score, y_score)}
  • _top_motion_ids(projections, axis, n=5){'+': [ids], '-': [ids]}
    • Sorts by axis score, returns top n positive and n negative motion IDs
  • _fetch_motion_titles(db_path, motion_ids)dict[int, tuple[str, str]]
    • SELECT id, title, date FROM motions WHERE id IN (...)
    • Returns {id: (title, date_str)}
  • _classify_from_titles(titles)str | None
    • Applies keyword dict against concatenated titles of top motions
    • Returns category string or None if confidence below threshold (0.4)

New module-level constant:

  • _KEYWORDS: dict[str, list[str]] — Dutch keyword → category mapping (see below)

Modified classify_axes:

  1. Check if axes contains global_mean; if not, skip motion classification.
  2. For each window W: a. Load motion vectors b. Project onto x_axis, y_axis using global_mean c. Find top 5+5 motions per axis d. Fetch titles from motions table e. Apply keyword classifier → label candidate f. If None: fall through to existing Pearson-r approaches
  3. Store x_top_motions and y_top_motions per window in enriched dict
  4. Store x_label_confidence and y_label_confidence per window

explorer.py (two changes)

  1. Axis swap in load_positions, after classify_axes returns:

    if axis_def.get("y_label") == "Links–Rechts" and axis_def.get("x_label") != "Links–Rechts":
        positions_by_window, axis_def = _swap_axes(positions_by_window, axis_def)
    

    _swap_axes transposes (x, y) in every entity position and swaps all x_/y_ keys in axis_def.

  2. Motion expander in build_compass_tab, below st.plotly_chart:

    with st.expander("🔍 Wat bepaalt deze assen?"):
        # show top 3 +/- motions for x and y, with date
        # show confidence and explained variance for this window
    

Data Flow

compute_2d_axes(db_path, windows)
  → (positions_by_window, axes)        # axes now contains global_mean

classify_axes(positions_by_window, axes, db_path)
  → axis_def                            # now contains x/y_top_motions, confidence

load_positions (in explorer.py)
  → swap axes if y_label == "Links–Rechts"
  → return (positions_by_window, axis_def)

build_compass_tab
  → scatter chart (uses x_label, y_label — already wired)
  → expander (uses x_top_motions, y_top_motions)

Keyword Dictionary

Categories and representative terms (non-exhaustive; full dict in implementation):

Links-Rechts

  • Economic: belasting, uitkering, bijstand, minimumloon, cao, vakbond, bezuiniging, privatisering, subsidie, zorg, pensioen, AOW
  • Immigration: asiel, asielaanvraag, migratie, vreemdeling, vluchtelingen, terugkeer, grenzen, opvang, statushouder

Progressief-Conservatief

  • Environment: klimaat, stikstof, duurzaam, duurzaamheid, co2, energietransitie, biodiversiteit
  • Social: euthanasie, abortus, lgbtq, transgender, diversiteit, traditi, gezin, religie, geloof

Coalitie-Oppositie (detected via coalition correlation, not keywords — keyword detection for this category is unreliable)

Nationaal-Internationaal (optional, lower priority)

  • navo, nato, europees, europese, eu, verdrag, vn, internationaal

Matching: case-insensitive substring match on lowercased title. Score = fraction of top-10 motions containing at least one keyword from the winning category. Threshold for acceptance = 0.4 (i.e., at least 4 out of 10 top motions match).

New axis_def Fields

x_top_motions:    {window_id: {'+': [(title, date), ...], '-': [(title, date), ...]}}
y_top_motions:    same structure
x_label_confidence: {window_id: float}   # 0.0–1.0
y_label_confidence: {window_id: float}
global_mean:      np.ndarray             # stored in axes dict, not surfaced to UI

Existing fields (x_label, y_label, x_quality, y_quality, x_interpretation, y_interpretation) are preserved.

UI Display (Option C)

Axis titles: unchanged — already uses axis_def.get("x_label").

New expander (collapsed by default) below compass scatter:

🔍 Wat bepaalt deze assen?

Horizontale as: Links–Rechts  (vertrouwen: 70%)
  Rechtspool: Motie over asielbeleid (2023-11-14) · Motie over belastingverlaging (2023-10-05) ...
  Linkspool:  Motie over uitkeringen (2023-11-20) · Motie over minimumloon (2023-09-12) ...

Verticale as: Progressief–Conservatief  (vertrouwen: 55%)
  Progressief: Motie over klimaatdoelen (2023-12-01) ...
  Conservatief: Motie over tradities (2023-10-18) ...

As 1 verklaart 11% van de variantie in stemgedrag.

Error Handling

Situation Behavior
No motion vectors for window Skip motion classification; fall through to ideology CSV
Motion title fetch fails Use motion IDs as placeholder; label falls back
Keyword confidence below threshold Fall through to coalition correlation
Both motion and CSV classification fail "Stempatroon As N" (existing)
global_mean missing from axes Skip motion projection entirely

Testing Strategy

New unit tests (in tests/test_political_compass.py):

  • test_classify_from_titles_left_right — mock titles with asiel/belasting → expect "Links–Rechts"
  • test_classify_from_titles_progressive — mock titles with klimaat/stikstof → expect "Progressief–Conservatief"
  • test_classify_from_titles_low_confidence — mixed keywords → expect None (fallback triggered)
  • test_axis_swap_when_y_is_left_right — positions (x,y) → (y,x), labels swapped
  • test_axis_swap_not_applied_when_x_is_left_right — no swap when already correct

All 8 existing tests must continue to pass.

Out of Scope

Explained variance drop (18% → 11%): Observed but not addressed here. Likely reflects genuine fragmentation of the Schoof parliament (4 smaller coalition parties). Warrants a separate diagnostic session. The expander now surfaces the explained variance, making this visible to users.

Proper Procrustes alignment of motion vectors: The projection approximation (ignoring per-window rotation) is acceptable for v1. If label instability is observed across windows, add rotation application as a follow-up.

Removing party_ideologies.csv: Kept as fallback. Can be removed once motion classification has proven reliable over several parliament periods.