From bed911b92cdf7ba94c75bbdfc1beaf1f6b078882 Mon Sep 17 00:00:00 2001 From: Sven Geboers Date: Sun, 29 Mar 2026 00:51:34 +0100 Subject: [PATCH] docs: add axis classification design spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add design for honest PCA axis labeling — validates each compass axis against a party ideology reference CSV and labels dynamically (Links–Rechts, Coalitie–Oppositie, or fallback) instead of hardcoding Left–Right always. --- .../2026-03-29-axis-classification-design.md | 279 ++++++++++++++++++ 1 file changed, 279 insertions(+) create mode 100644 docs/superpowers/specs/2026-03-29-axis-classification-design.md diff --git a/docs/superpowers/specs/2026-03-29-axis-classification-design.md b/docs/superpowers/specs/2026-03-29-axis-classification-design.md new file mode 100644 index 0000000..67d93bf --- /dev/null +++ b/docs/superpowers/specs/2026-03-29-axis-classification-design.md @@ -0,0 +1,279 @@ +--- +date: 2026-03-29 +topic: "Honest PCA Axis Classification" +status: validated +--- + +# Axis Classification Design + +## Problem Statement + +The political compass always labels its X-axis "Links–Rechts" and Y-axis "Progressief–Conservatief" +regardless of what the PCA actually found. In coalition years, the first principal component captures +**coalition membership**, not ideology. The dominant axis of voting variation in Rutte II (VVD+PvdA) +and Rutte III/IV (VVD+CDA+D66+CU) is "are you in the governing coalition?" PvdA and PVV end up at the +same position because both were in opposition — technically correct voting similarity, but the label +"Links–Rechts" is a lie. + +The fix: after each PCA, validate what the axes actually capture by correlating party positions against +a small reference dataset of known ideological scores. Assign labels honestly. + +## Constraints + +- No changes to the PCA computation itself (`compute_2d_axes` is unchanged) +- No new runtime dependencies (scipy is already optional; pandas is already present) +- `party_ideologies.csv` and `coalition_membership.csv` are static data files — not derived from the DB +- Backward-compatible: the compass still renders even when reference files are missing (falls back to + current hardcoded labels silently) + +## Approach + +Reference-validated PCA with dynamic labeling. For each time window, correlate the per-party PCA +positions against known ideological scores. Assign a label based on which correlation is strongest. +Surface the finding as a one-line caption in the UI when the axis diverges from "Links–Rechts". + +Rejected alternatives: +- **Fixed anchor compass**: replaces honest complexity with comfortable fiction; loses behavioral + information entirely +- **Dual view (behavioral + ideological)**: too much UI complexity for V1; can be done later + +## Architecture Overview + +A thin axis classification layer sits between `compute_2d_axes` (unchanged) and the compass UI. + +``` +compute_2d_axes() + ↓ + positions_by_window + axes dict + ↓ +classify_axes(positions_by_window, axes, db_path) + ↓ + axes dict enriched with: + - x_label, y_label (global, most-common label across annual windows) + - x_quality (dict: window_id → float, max |r|) + - y_quality (dict: window_id → float, max |r|) + - x_interpretation (dict: window_id → Dutch str) + - y_interpretation (dict: window_id → Dutch str) + ↓ + compass renderer uses labels + per-year quality captions +``` + +## Components + +### 1. Reference data files + +**`data/party_ideologies.csv`** + +One row per party. Party names must match entity IDs in the `svd_vectors` table exactly. + +``` +party,left_right,progressive +VVD,0.65,0.10 +PvdA,-0.70,0.75 +SP,-0.90,0.50 +CDA,0.25,-0.45 +D66,-0.10,0.85 +GroenLinks,-0.70,0.90 +GL,-0.70,0.90 +GroenLinks-PvdA,-0.70,0.82 +ChristenUnie,0.10,-0.55 +SGP,0.35,-0.95 +PVV,0.90,-0.50 +DENK,-0.40,0.55 +50Plus,-0.05,-0.10 +FVD,0.90,-0.75 +PvdD,-0.60,0.85 +Volt,-0.20,0.80 +JA21,0.70,-0.30 +BBB,0.50,-0.35 +NSC,0.20,-0.20 +Nieuw Sociaal Contract,0.20,-0.20 +BVNL,0.85,-0.55 +Bij1,-0.90,0.90 +``` + +Scores: left_right = −1 (far left) to +1 (far right). progressive = −1 (conservative) to +1 (progressive). +These are expert judgments based on party programs and voting records, not derived algorithmically. + +**`data/coalition_membership.csv`** + +One row per (window_id, party) where that party held a government seat. Annual windows only; quarterly +windows inherit from their year. + +``` +window_id,party +2012,VVD +2012,PvdA +2013,VVD +2013,PvdA +2014,VVD +2014,PvdA +2015,VVD +2015,PvdA +2016,VVD +2016,PvdA +2017,VVD +2017,CDA +2017,D66 +2017,ChristenUnie +2018,VVD +2018,CDA +2018,D66 +2018,ChristenUnie +2019,VVD +2019,CDA +2019,D66 +2019,ChristenUnie +2020,VVD +2020,CDA +2020,D66 +2020,ChristenUnie +2021,VVD +2021,CDA +2021,D66 +2021,ChristenUnie +2022,VVD +2022,D66 +2022,CDA +2022,ChristenUnie +2023,VVD +2023,D66 +2023,CDA +2023,ChristenUnie +2024,PVV +2024,VVD +2024,NSC +2024,BBB +2025,PVV +2025,VVD +2025,NSC +2025,BBB +2026,PVV +2026,VVD +2026,NSC +2026,BBB +``` + +### 2. `analysis/axis_classifier.py` (new module) + +Single public function: `classify_axes(positions_by_window, axes, db_path)`. + +The function is pure except for reading two CSV files (cached module-level after first load). + +**Algorithm per window:** + +1. Collect parties that appear in both `positions_by_window[window_id]` and `party_ideologies.csv`. + Skip windows with fewer than 5 overlapping parties. +2. Build vectors: + - `party_x`: per-party X positions from this window + - `party_y`: per-party Y positions from this window + - `ref_lr`: left_right scores from CSV + - `ref_pc`: progressive scores from CSV + - `coalition_dummy`: +1 if party is in government for this window's year, −1 otherwise + (quarterly windows: strip suffix to get year, e.g., `2016-Q3` → `2016`) +3. Compute Pearson r for X against each reference dimension: + - `r_lr_x = pearsonr(party_x, ref_lr)[0]` + - `r_pc_x = pearsonr(party_x, ref_pc)[0]` + - `r_co_x = pearsonr(party_x, coalition_dummy)[0]` +4. Assign label and interpretation using priority order (first threshold that fires wins): + - `|r_lr_x| ≥ 0.65` → label = `"Links–Rechts"`, flip sign if r < 0 + - `|r_co_x| ≥ 0.65` → label = `"Coalitie–Oppositie"` + - `|r_pc_x| ≥ 0.65` → label = `"Progressief–Conservatief"`, flip sign if r < 0 + - fallback → label = `"Stempatroon As 1"` +5. Quality score for this window's X-axis: `max(|r_lr_x|, |r_pc_x|, |r_co_x|)` +6. Repeat steps 3–5 for Y-axis using `party_y`. +7. After processing all windows, pick global X label = modal label across annual windows only + (quarterly windows participate in quality tracking but not in the modal vote, to avoid + over-weighting). + +**Interpretation strings (Dutch):** + +| label | interpretation | +|---|---| +| Links–Rechts | "De horizontale as weerspiegelt de klassieke links-rechts tegenstelling." | +| Coalitie–Oppositie | "De horizontale as weerspiegelt stemgedrag van coalitie- versus oppositiepartijen (r={r:.2f}). Links-rechts is minder dominant dit jaar." | +| Progressief–Conservatief | "De horizontale as weerspiegelt de progressief-conservatieve tegenstelling." | +| Stempatroon As 1 | "De horizontale as weerspiegelt een empirisch stempatroon zonder duidelijke ideologische richting." | + +Y-axis interpretations follow the same template with "verticale" instead of "horizontale". + +**Return value:** the input `axes` dict with four new keys added: +`x_label`, `y_label`, `x_quality` (dict), `y_quality` (dict), `x_interpretation` (dict), +`y_interpretation` (dict). + +### 3. `explorer.py` changes + +**`load_positions()`** — after calling `compute_2d_axes`, call `classify_axes` and store the enriched +axes dict. If `classify_axes` raises for any reason, catch and log; use the original axes dict. + +**Compass renderer** — two changes only: +1. Replace hardcoded `"Links–Rechts"` / `"Progressief–Conservatief"` axis title strings with + `axes.get("x_label", "Links–Rechts")` and `axes.get("y_label", "Progressief–Conservatief")`. +2. Add a caption below the compass for the selected year. Show when either axis quality < 0.65: + > *"In 2016 weerspiegelt de horizontale as coalitie–oppositie stemgedrag (r=0.71)."* + + Source: `axes["x_interpretation"].get(selected_window_id, "")`. + +No other UI changes. The compass layout is untouched. + +## Data Flow + +``` +load_positions(db_path, window_size) + → compute_2d_axes(...) [unchanged; returns positions_by_window, axes] + → classify_axes( [new] + positions_by_window, + axes, + db_path=db_path + ) + reads: data/party_ideologies.csv (module-level cache) + reads: data/coalition_membership.csv (module-level cache) + uses: positions_by_window already in memory + writes: new keys into axes dict (no mutation of positions) + → return positions_by_window, axes_enriched + +compass render (existing function) + → axes["x_label"] [was hardcoded "Links–Rechts"] + → axes["y_label"] [was hardcoded "Progressief–Conservatief"] + → axes["x_interpretation"][window_id] [new caption] +``` + +No DB writes. No new DB queries. Pure in-memory correlation over data that's already loaded. +CSV reads are ~microseconds and cached after first call. + +## Error Handling + +| Failure | Behaviour | +|---|---| +| `data/party_ideologies.csv` missing | Log WARNING, return `axes` unchanged (current labels preserved) | +| `data/coalition_membership.csv` missing | Log WARNING, coalition dimension skipped; other correlations still computed | +| Party in positions but not in CSV | Skip silently; log once at DEBUG per session | +| Window has fewer than 5 overlapping parties | Skip classification for that window; use fallback label | +| All correlations < 0.65 | Fallback label is always safe; no crash | +| Any unexpected exception in `classify_axes` | Caller (`load_positions`) catches, logs, returns original `axes` dict | + +## Testing Strategy + +Three new tests added to `tests/test_political_compass.py`: + +**`test_axis_label_left_right`** +Construct synthetic per-party positions where X values correlate strongly (r > 0.8) with the left_right +column of a minimal inline CSV. Assert that `classify_axes` returns `x_label == "Links–Rechts"` and +`x_quality[window] > 0.65`. + +**`test_axis_label_coalition_dominant`** +Construct synthetic positions where X values match coalition membership pattern but NOT left-right. +(E.g., coalition parties [VVD, PvdA] cluster at x=+1, opposition [PVV, SP] at x=−1, which is +historically coherent for 2016.) Assert `x_label == "Coalitie–Oppositie"` and that the interpretation +string contains "coalitie". + +**`test_axis_classifier_missing_csv`** +Call `classify_axes` with a db_path pointing to a nonexistent directory so CSV loading fails. Assert +that the function returns the axes dict unchanged and does not raise. + +All three tests use monkeypatching to inject CSV content as in-memory StringIO, following the existing +pattern in `tests/test_political_compass.py` of patching module-level imports. + +## Open Questions + +None.