Add design spec for motion-driven axis labeling

Replaces static ideology CSV as primary axis classification signal with per-year motion projection + Dutch keyword classifier. Adds axis-swap logic so left-right is conventionally on X when present. Adds Option C UI expander showing top motions per axis pole.
4 months ago · 9dcf6201bb
parent 392fd3afce
commit 9dcf6201bb
1 changed files with 219 additions and 0 deletions
--- a/docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md
+++ b/docs/superpowers/specs/2026-03-29-motion-driven-axis-labeling-design.md
@ -0,0 +1,219 @@
+---
+date: 2026-03-29
+topic: "Motion-Driven Axis Labeling for Political Compass"
+status: validated
+---
+
+# Motion-Driven Axis Labeling
+
+## Problem Statement
+
+The current axis labeling in `analysis/axis_classifier.py` correlates per-party PCA
+positions against static scores from `data/party_ideologies.csv`. This has three
+failure modes:
+
+1. **Mislabeling**: When the dominant PCA axis is coalition/opposition rather than
+   left-right, it gets labeled "Links-Rechts" anyway, making the compass look "rotated
+   90 degrees".
+2. **Static reference**: A fixed ideology CSV cannot reflect year-specific political
+   dynamics (e.g., asylum being the main left-right issue in 2015 vs. housing in 2023).
+3. **No explainability**: Users cannot see *why* an axis got a particular label.
+
+The fix is to derive labels from the **actual motions** that most strongly split
+parliament on each PCA axis in a given year, and to expose those motions to users.
+
+## Constraints
+
+- Must not break existing 8 passing tests.
+- Must remain DuckDB-only for data access (no new external files for primary path).
+- `party_ideologies.csv` and `coalition_membership.csv` remain as fallbacks — not
+  removed.
+- The labeling approximation (projecting motion vectors without full Procrustes
+  alignment) is acceptable for v1. Proper alignment can be added later.
+- Labels must still be deterministic given the same DB state.
+
+## Approach
+
+**Primary**: For each window, load motion SVD vectors from the DB, project them onto
+the PCA axes, rank motions by projection score, apply a Dutch keyword classifier to the
+top motion titles, and derive a categorical label.
+
+**Fallback chain** (unchanged from today):
+1. Keyword classifier on top motions → categorical label
+2. Coalition correlation (existing `_pearsonr` against coalition dummy)
+3. Ideology CSV correlation (existing Pearson-r against `party_ideologies.csv`)
+4. "Stempatroon As N" (generic fallback)
+
+**Axis swap**: After classification, if Y-axis is "Links-Rechts" and X-axis is not,
+swap them (both positions and all axis metadata), so that left-right is conventionally
+on the horizontal axis when present.
+
+## Architecture
+
+### Changes by file
+
+#### `analysis/political_axis.py` (minimal)
+- Add `axes["global_mean"] = M.mean(axis=0)` before returning from `compute_2d_axes`.
+  This lets `classify_axes` center motion vectors before projection without needing to
+  re-access the stacked matrix.
+
+#### `analysis/axis_classifier.py` (major)
+
+New private helpers:
+- `_load_motion_vectors(db_path, window_id)` → `dict[int, np.ndarray]`
+  - SELECT entity_id, vector FROM svd_vectors WHERE entity_type='motion' AND window_id=?
+  - Returns {motion_id: vector}. Returns {} on any DB error.
+- `_project_motions(motion_vecs, x_axis, y_axis, global_mean)` → `dict[int, tuple[float, float]]`
+  - For each motion: `x = dot(vec - global_mean, x_axis)`, `y = dot(vec - global_mean, y_axis)`
+  - Returns {motion_id: (x_score, y_score)}
+- `_top_motion_ids(projections, axis, n=5)` → `{'+': [ids], '-': [ids]}`
+  - Sorts by axis score, returns top n positive and n negative motion IDs
+- `_fetch_motion_titles(db_path, motion_ids)` → `dict[int, tuple[str, str]]`
+  - SELECT id, title, date FROM motions WHERE id IN (...)
+  - Returns {id: (title, date_str)}
+- `_classify_from_titles(titles)` → `str | None`
+  - Applies keyword dict against concatenated titles of top motions
+  - Returns category string or None if confidence below threshold (0.4)
+
+New module-level constant:
+- `_KEYWORDS: dict[str, list[str]]` — Dutch keyword → category mapping (see below)
+
+Modified `classify_axes`:
+1. Check if `axes` contains `global_mean`; if not, skip motion classification.
+2. For each window W:
+   a. Load motion vectors
+   b. Project onto x_axis, y_axis using global_mean
+   c. Find top 5+5 motions per axis
+   d. Fetch titles from motions table
+   e. Apply keyword classifier → label candidate
+   f. If None: fall through to existing Pearson-r approaches
+3. Store `x_top_motions` and `y_top_motions` per window in enriched dict
+4. Store `x_label_confidence` and `y_label_confidence` per window
+
+#### `explorer.py` (two changes)
+
+1. **Axis swap** in `load_positions`, after `classify_axes` returns:
+   ```
+   if axis_def.get("y_label") == "Links–Rechts" and axis_def.get("x_label") != "Links–Rechts":
+       positions_by_window, axis_def = _swap_axes(positions_by_window, axis_def)
+   ```
+   `_swap_axes` transposes (x, y) in every entity position and swaps all x_*/y_*
+   keys in axis_def.
+
+2. **Motion expander** in `build_compass_tab`, below `st.plotly_chart`:
+   ```
+   with st.expander("🔍 Wat bepaalt deze assen?"):
+       # show top 3 +/- motions for x and y, with date
+       # show confidence and explained variance for this window
+   ```
+
+## Data Flow
+
+```
+compute_2d_axes(db_path, windows)
+  → (positions_by_window, axes)        # axes now contains global_mean
+
+classify_axes(positions_by_window, axes, db_path)
+  → axis_def                            # now contains x/y_top_motions, confidence
+
+load_positions (in explorer.py)
+  → swap axes if y_label == "Links–Rechts"
+  → return (positions_by_window, axis_def)
+
+build_compass_tab
+  → scatter chart (uses x_label, y_label — already wired)
+  → expander (uses x_top_motions, y_top_motions)
+```
+
+## Keyword Dictionary
+
+Categories and representative terms (non-exhaustive; full dict in implementation):
+
+**Links-Rechts**
+- Economic: `belasting`, `uitkering`, `bijstand`, `minimumloon`, `cao`, `vakbond`,
+  `bezuiniging`, `privatisering`, `subsidie`, `zorg`, `pensioen`, `AOW`
+- Immigration: `asiel`, `asielaanvraag`, `migratie`, `vreemdeling`, `vluchtelingen`,
+  `terugkeer`, `grenzen`, `opvang`, `statushouder`
+
+**Progressief-Conservatief**
+- Environment: `klimaat`, `stikstof`, `duurzaam`, `duurzaamheid`, `co2`,
+  `energietransitie`, `biodiversiteit`
+- Social: `euthanasie`, `abortus`, `lgbtq`, `transgender`, `diversiteit`, `traditi`,
+  `gezin`, `religie`, `geloof`
+
+**Coalitie-Oppositie** (detected via coalition correlation, not keywords — keyword
+detection for this category is unreliable)
+
+**Nationaal-Internationaal** (optional, lower priority)
+- `navo`, `nato`, `europees`, `europese`, `eu`, `verdrag`, `vn`, `internationaal`
+
+Matching: case-insensitive substring match on lowercased title. Score = fraction of
+top-10 motions containing at least one keyword from the winning category. Threshold
+for acceptance = 0.4 (i.e., at least 4 out of 10 top motions match).
+
+## New `axis_def` Fields
+
+```
+x_top_motions:    {window_id: {'+': [(title, date), ...], '-': [(title, date), ...]}}
+y_top_motions:    same structure
+x_label_confidence: {window_id: float}   # 0.0–1.0
+y_label_confidence: {window_id: float}
+global_mean:      np.ndarray             # stored in axes dict, not surfaced to UI
+```
+
+Existing fields (`x_label`, `y_label`, `x_quality`, `y_quality`, `x_interpretation`,
+`y_interpretation`) are preserved.
+
+## UI Display (Option C)
+
+**Axis titles**: unchanged — already uses `axis_def.get("x_label")`.
+
+**New expander** (collapsed by default) below compass scatter:
+```
+🔍 Wat bepaalt deze assen?
+
+Horizontale as: Links–Rechts  (vertrouwen: 70%)
+  Rechtspool: Motie over asielbeleid (2023-11-14) · Motie over belastingverlaging (2023-10-05) ...
+  Linkspool:  Motie over uitkeringen (2023-11-20) · Motie over minimumloon (2023-09-12) ...
+
+Verticale as: Progressief–Conservatief  (vertrouwen: 55%)
+  Progressief: Motie over klimaatdoelen (2023-12-01) ...
+  Conservatief: Motie over tradities (2023-10-18) ...
+
+As 1 verklaart 11% van de variantie in stemgedrag.
+```
+
+## Error Handling
+
+| Situation | Behavior |
+|---|---|
+| No motion vectors for window | Skip motion classification; fall through to ideology CSV |
+| Motion title fetch fails | Use motion IDs as placeholder; label falls back |
+| Keyword confidence below threshold | Fall through to coalition correlation |
+| Both motion and CSV classification fail | "Stempatroon As N" (existing) |
+| `global_mean` missing from axes | Skip motion projection entirely |
+
+## Testing Strategy
+
+New unit tests (in `tests/test_political_compass.py`):
+- `test_classify_from_titles_left_right` — mock titles with `asiel`/`belasting` → expect "Links–Rechts"
+- `test_classify_from_titles_progressive` — mock titles with `klimaat`/`stikstof` → expect "Progressief–Conservatief"
+- `test_classify_from_titles_low_confidence` — mixed keywords → expect None (fallback triggered)
+- `test_axis_swap_when_y_is_left_right` — positions (x,y) → (y,x), labels swapped
+- `test_axis_swap_not_applied_when_x_is_left_right` — no swap when already correct
+
+All 8 existing tests must continue to pass.
+
+## Out of Scope
+
+**Explained variance drop (18% → 11%)**: Observed but not addressed here. Likely
+reflects genuine fragmentation of the Schoof parliament (4 smaller coalition parties).
+Warrants a separate diagnostic session. The expander now surfaces the explained
+variance, making this visible to users.
+
+**Proper Procrustes alignment of motion vectors**: The projection approximation
+(ignoring per-window rotation) is acceptable for v1. If label instability is observed
+across windows, add rotation application as a follow-up.
+
+**Removing `party_ideologies.csv`**: Kept as fallback. Can be removed once motion
+classification has proven reliable over several parliament periods.