From 9dcf6201bbfbbb35bee13204140ef1c6a732f1c4 Mon Sep 17 00:00:00 2001 From: Sven Geboers Date: Sun, 29 Mar 2026 14:19:54 +0200 Subject: [PATCH] Add design spec for motion-driven axis labeling Replaces static ideology CSV as primary axis classification signal with per-year motion projection + Dutch keyword classifier. Adds axis-swap logic so left-right is conventionally on X when present. Adds Option C UI expander showing top motions per axis pole. --- ...3-29-motion-driven-axis-labeling-design.md | 219 ++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md diff --git a/docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md b/docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md new file mode 100644 index 0000000..aea0a9a --- /dev/null +++ b/docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md @@ -0,0 +1,219 @@ +--- +date: 2026-03-29 +topic: "Motion-Driven Axis Labeling for Political Compass" +status: validated +--- + +# Motion-Driven Axis Labeling + +## Problem Statement + +The current axis labeling in `analysis/axis_classifier.py` correlates per-party PCA +positions against static scores from `data/party_ideologies.csv`. This has three +failure modes: + +1. **Mislabeling**: When the dominant PCA axis is coalition/opposition rather than + left-right, it gets labeled "Links-Rechts" anyway, making the compass look "rotated + 90 degrees". +2. **Static reference**: A fixed ideology CSV cannot reflect year-specific political + dynamics (e.g., asylum being the main left-right issue in 2015 vs. housing in 2023). +3. **No explainability**: Users cannot see *why* an axis got a particular label. + +The fix is to derive labels from the **actual motions** that most strongly split +parliament on each PCA axis in a given year, and to expose those motions to users. + +## Constraints + +- Must not break existing 8 passing tests. +- Must remain DuckDB-only for data access (no new external files for primary path). +- `party_ideologies.csv` and `coalition_membership.csv` remain as fallbacks — not + removed. +- The labeling approximation (projecting motion vectors without full Procrustes + alignment) is acceptable for v1. Proper alignment can be added later. +- Labels must still be deterministic given the same DB state. + +## Approach + +**Primary**: For each window, load motion SVD vectors from the DB, project them onto +the PCA axes, rank motions by projection score, apply a Dutch keyword classifier to the +top motion titles, and derive a categorical label. + +**Fallback chain** (unchanged from today): +1. Keyword classifier on top motions → categorical label +2. Coalition correlation (existing `_pearsonr` against coalition dummy) +3. Ideology CSV correlation (existing Pearson-r against `party_ideologies.csv`) +4. "Stempatroon As N" (generic fallback) + +**Axis swap**: After classification, if Y-axis is "Links-Rechts" and X-axis is not, +swap them (both positions and all axis metadata), so that left-right is conventionally +on the horizontal axis when present. + +## Architecture + +### Changes by file + +#### `analysis/political_axis.py` (minimal) +- Add `axes["global_mean"] = M.mean(axis=0)` before returning from `compute_2d_axes`. + This lets `classify_axes` center motion vectors before projection without needing to + re-access the stacked matrix. + +#### `analysis/axis_classifier.py` (major) + +New private helpers: +- `_load_motion_vectors(db_path, window_id)` → `dict[int, np.ndarray]` + - SELECT entity_id, vector FROM svd_vectors WHERE entity_type='motion' AND window_id=? + - Returns {motion_id: vector}. Returns {} on any DB error. +- `_project_motions(motion_vecs, x_axis, y_axis, global_mean)` → `dict[int, tuple[float, float]]` + - For each motion: `x = dot(vec - global_mean, x_axis)`, `y = dot(vec - global_mean, y_axis)` + - Returns {motion_id: (x_score, y_score)} +- `_top_motion_ids(projections, axis, n=5)` → `{'+': [ids], '-': [ids]}` + - Sorts by axis score, returns top n positive and n negative motion IDs +- `_fetch_motion_titles(db_path, motion_ids)` → `dict[int, tuple[str, str]]` + - SELECT id, title, date FROM motions WHERE id IN (...) + - Returns {id: (title, date_str)} +- `_classify_from_titles(titles)` → `str | None` + - Applies keyword dict against concatenated titles of top motions + - Returns category string or None if confidence below threshold (0.4) + +New module-level constant: +- `_KEYWORDS: dict[str, list[str]]` — Dutch keyword → category mapping (see below) + +Modified `classify_axes`: +1. Check if `axes` contains `global_mean`; if not, skip motion classification. +2. For each window W: + a. Load motion vectors + b. Project onto x_axis, y_axis using global_mean + c. Find top 5+5 motions per axis + d. Fetch titles from motions table + e. Apply keyword classifier → label candidate + f. If None: fall through to existing Pearson-r approaches +3. Store `x_top_motions` and `y_top_motions` per window in enriched dict +4. Store `x_label_confidence` and `y_label_confidence` per window + +#### `explorer.py` (two changes) + +1. **Axis swap** in `load_positions`, after `classify_axes` returns: + ``` + if axis_def.get("y_label") == "Links–Rechts" and axis_def.get("x_label") != "Links–Rechts": + positions_by_window, axis_def = _swap_axes(positions_by_window, axis_def) + ``` + `_swap_axes` transposes (x, y) in every entity position and swaps all x_*/y_* + keys in axis_def. + +2. **Motion expander** in `build_compass_tab`, below `st.plotly_chart`: + ``` + with st.expander("🔍 Wat bepaalt deze assen?"): + # show top 3 +/- motions for x and y, with date + # show confidence and explained variance for this window + ``` + +## Data Flow + +``` +compute_2d_axes(db_path, windows) + → (positions_by_window, axes) # axes now contains global_mean + +classify_axes(positions_by_window, axes, db_path) + → axis_def # now contains x/y_top_motions, confidence + +load_positions (in explorer.py) + → swap axes if y_label == "Links–Rechts" + → return (positions_by_window, axis_def) + +build_compass_tab + → scatter chart (uses x_label, y_label — already wired) + → expander (uses x_top_motions, y_top_motions) +``` + +## Keyword Dictionary + +Categories and representative terms (non-exhaustive; full dict in implementation): + +**Links-Rechts** +- Economic: `belasting`, `uitkering`, `bijstand`, `minimumloon`, `cao`, `vakbond`, + `bezuiniging`, `privatisering`, `subsidie`, `zorg`, `pensioen`, `AOW` +- Immigration: `asiel`, `asielaanvraag`, `migratie`, `vreemdeling`, `vluchtelingen`, + `terugkeer`, `grenzen`, `opvang`, `statushouder` + +**Progressief-Conservatief** +- Environment: `klimaat`, `stikstof`, `duurzaam`, `duurzaamheid`, `co2`, + `energietransitie`, `biodiversiteit` +- Social: `euthanasie`, `abortus`, `lgbtq`, `transgender`, `diversiteit`, `traditi`, + `gezin`, `religie`, `geloof` + +**Coalitie-Oppositie** (detected via coalition correlation, not keywords — keyword +detection for this category is unreliable) + +**Nationaal-Internationaal** (optional, lower priority) +- `navo`, `nato`, `europees`, `europese`, `eu`, `verdrag`, `vn`, `internationaal` + +Matching: case-insensitive substring match on lowercased title. Score = fraction of +top-10 motions containing at least one keyword from the winning category. Threshold +for acceptance = 0.4 (i.e., at least 4 out of 10 top motions match). + +## New `axis_def` Fields + +``` +x_top_motions: {window_id: {'+': [(title, date), ...], '-': [(title, date), ...]}} +y_top_motions: same structure +x_label_confidence: {window_id: float} # 0.0–1.0 +y_label_confidence: {window_id: float} +global_mean: np.ndarray # stored in axes dict, not surfaced to UI +``` + +Existing fields (`x_label`, `y_label`, `x_quality`, `y_quality`, `x_interpretation`, +`y_interpretation`) are preserved. + +## UI Display (Option C) + +**Axis titles**: unchanged — already uses `axis_def.get("x_label")`. + +**New expander** (collapsed by default) below compass scatter: +``` +🔍 Wat bepaalt deze assen? + +Horizontale as: Links–Rechts (vertrouwen: 70%) + Rechtspool: Motie over asielbeleid (2023-11-14) · Motie over belastingverlaging (2023-10-05) ... + Linkspool: Motie over uitkeringen (2023-11-20) · Motie over minimumloon (2023-09-12) ... + +Verticale as: Progressief–Conservatief (vertrouwen: 55%) + Progressief: Motie over klimaatdoelen (2023-12-01) ... + Conservatief: Motie over tradities (2023-10-18) ... + +As 1 verklaart 11% van de variantie in stemgedrag. +``` + +## Error Handling + +| Situation | Behavior | +|---|---| +| No motion vectors for window | Skip motion classification; fall through to ideology CSV | +| Motion title fetch fails | Use motion IDs as placeholder; label falls back | +| Keyword confidence below threshold | Fall through to coalition correlation | +| Both motion and CSV classification fail | "Stempatroon As N" (existing) | +| `global_mean` missing from axes | Skip motion projection entirely | + +## Testing Strategy + +New unit tests (in `tests/test_political_compass.py`): +- `test_classify_from_titles_left_right` — mock titles with `asiel`/`belasting` → expect "Links–Rechts" +- `test_classify_from_titles_progressive` — mock titles with `klimaat`/`stikstof` → expect "Progressief–Conservatief" +- `test_classify_from_titles_low_confidence` — mixed keywords → expect None (fallback triggered) +- `test_axis_swap_when_y_is_left_right` — positions (x,y) → (y,x), labels swapped +- `test_axis_swap_not_applied_when_x_is_left_right` — no swap when already correct + +All 8 existing tests must continue to pass. + +## Out of Scope + +**Explained variance drop (18% → 11%)**: Observed but not addressed here. Likely +reflects genuine fragmentation of the Schoof parliament (4 smaller coalition parties). +Warrants a separate diagnostic session. The expander now surfaces the explained +variance, making this visible to users. + +**Proper Procrustes alignment of motion vectors**: The projection approximation +(ignoring per-window rotation) is acceptable for v1. If label instability is observed +across windows, add rotation application as a follow-up. + +**Removing `party_ideologies.csv`**: Kept as fallback. Can be removed once motion +classification has proven reliable over several parliament periods.