You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
285 lines
10 KiB
285 lines
10 KiB
---
|
|
date: 2026-03-29
|
|
topic: "Honest PCA Axis Classification"
|
|
status: validated
|
|
---
|
|
|
|
# Axis Classification Design
|
|
|
|
## Problem Statement
|
|
|
|
The political compass always labels its X-axis "Links–Rechts" and Y-axis "Progressief–Conservatief"
|
|
regardless of what the PCA actually found. In coalition years, the first principal component captures
|
|
**coalition membership**, not ideology. The dominant axis of voting variation in Rutte II (VVD+PvdA)
|
|
and Rutte III/IV (VVD+CDA+D66+CU) is "are you in the governing coalition?" PvdA and PVV end up at the
|
|
same position because both were in opposition — technically correct voting similarity, but the label
|
|
"Links–Rechts" is a lie.
|
|
|
|
The fix: after each PCA, validate what the axes actually capture by correlating party positions against
|
|
a small reference dataset of known ideological scores. Assign labels honestly.
|
|
|
|
## Constraints
|
|
|
|
- No changes to the PCA computation itself (`compute_2d_axes` is unchanged)
|
|
- No new runtime dependencies (scipy is already optional; pandas is already present)
|
|
- `party_ideologies.csv` and `coalition_membership.csv` are static data files — not derived from the DB
|
|
- Backward-compatible: the compass still renders even when reference files are missing (falls back to
|
|
current hardcoded labels silently)
|
|
|
|
## Approach
|
|
|
|
Reference-validated PCA with dynamic labeling. For each time window, correlate the per-party PCA
|
|
positions against known ideological scores. Assign a label based on which correlation is strongest.
|
|
Surface the finding as a one-line caption in the UI when the axis diverges from "Links–Rechts".
|
|
|
|
Rejected alternatives:
|
|
- **Fixed anchor compass**: replaces honest complexity with comfortable fiction; loses behavioral
|
|
information entirely
|
|
- **Dual view (behavioral + ideological)**: too much UI complexity for V1; can be done later
|
|
|
|
## Architecture Overview
|
|
|
|
A thin axis classification layer sits between `compute_2d_axes` (unchanged) and the compass UI.
|
|
|
|
```
|
|
compute_2d_axes()
|
|
↓
|
|
positions_by_window + axes dict
|
|
↓
|
|
classify_axes(positions_by_window, axes, db_path)
|
|
↓
|
|
axes dict enriched with:
|
|
- x_label, y_label (global, most-common label across annual windows)
|
|
- x_quality (dict: window_id → float, max |r|)
|
|
- y_quality (dict: window_id → float, max |r|)
|
|
- x_interpretation (dict: window_id → Dutch str)
|
|
- y_interpretation (dict: window_id → Dutch str)
|
|
↓
|
|
compass renderer uses labels + per-year quality captions
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. Reference data files
|
|
|
|
**`data/party_ideologies.csv`**
|
|
|
|
One row per party. Party names must match entity IDs in the `svd_vectors` table exactly.
|
|
|
|
```
|
|
party,left_right,progressive
|
|
VVD,0.65,0.10
|
|
PvdA,-0.70,0.75
|
|
SP,-0.90,0.50
|
|
CDA,0.25,-0.45
|
|
D66,-0.10,0.85
|
|
GroenLinks,-0.70,0.90
|
|
GL,-0.70,0.90
|
|
GroenLinks-PvdA,-0.70,0.82
|
|
ChristenUnie,0.10,-0.55
|
|
SGP,0.35,-0.95
|
|
PVV,0.90,-0.50
|
|
DENK,-0.40,0.55
|
|
50Plus,-0.05,-0.10
|
|
FVD,0.90,-0.75
|
|
PvdD,-0.60,0.85
|
|
Volt,-0.20,0.80
|
|
JA21,0.70,-0.30
|
|
BBB,0.50,-0.35
|
|
NSC,0.20,-0.20
|
|
Nieuw Sociaal Contract,0.20,-0.20
|
|
BVNL,0.85,-0.55
|
|
Bij1,-0.90,0.90
|
|
```
|
|
|
|
Scores: left_right = −1 (far left) to +1 (far right). progressive = −1 (conservative) to +1 (progressive).
|
|
These are expert judgments based on party programs and voting records, not derived algorithmically.
|
|
|
|
**`data/coalition_membership.csv`**
|
|
|
|
One row per (window_id, party) where that party held a government seat. Annual windows only; quarterly
|
|
windows inherit from their year.
|
|
|
|
```
|
|
window_id,party
|
|
2012,VVD
|
|
2012,PvdA
|
|
2013,VVD
|
|
2013,PvdA
|
|
2014,VVD
|
|
2014,PvdA
|
|
2015,VVD
|
|
2015,PvdA
|
|
2016,VVD
|
|
2016,PvdA
|
|
2017,VVD
|
|
2017,CDA
|
|
2017,D66
|
|
2017,ChristenUnie
|
|
2018,VVD
|
|
2018,CDA
|
|
2018,D66
|
|
2018,ChristenUnie
|
|
2019,VVD
|
|
2019,CDA
|
|
2019,D66
|
|
2019,ChristenUnie
|
|
2020,VVD
|
|
2020,CDA
|
|
2020,D66
|
|
2020,ChristenUnie
|
|
2021,VVD
|
|
2021,CDA
|
|
2021,D66
|
|
2021,ChristenUnie
|
|
2022,VVD
|
|
2022,D66
|
|
2022,CDA
|
|
2022,ChristenUnie
|
|
2023,VVD
|
|
2023,D66
|
|
2023,CDA
|
|
2023,ChristenUnie
|
|
2024,PVV
|
|
2024,VVD
|
|
2024,NSC
|
|
2024,BBB
|
|
2025,PVV
|
|
2025,VVD
|
|
2025,NSC
|
|
2025,BBB
|
|
2026,PVV
|
|
2026,VVD
|
|
2026,NSC
|
|
2026,BBB
|
|
```
|
|
|
|
### 2. `analysis/axis_classifier.py` (new module)
|
|
|
|
Single public function: `classify_axes(positions_by_window, axes, db_path)`.
|
|
|
|
The function is pure except for reading two CSV files (cached module-level after first load).
|
|
|
|
CSV paths are derived from `db_path`: `Path(db_path).parent / "party_ideologies.csv"` and
|
|
`Path(db_path).parent / "coalition_membership.csv"`. Both files live in the same `data/` directory
|
|
as the database.
|
|
|
|
**Algorithm per window:**
|
|
|
|
1. Collect parties that appear in both `positions_by_window[window_id]` and `party_ideologies.csv`.
|
|
Skip windows with fewer than 5 overlapping parties.
|
|
2. Build vectors:
|
|
- `party_x`: per-party X positions from this window
|
|
- `party_y`: per-party Y positions from this window
|
|
- `ref_lr`: left_right scores from CSV
|
|
- `ref_pc`: progressive scores from CSV
|
|
- `coalition_dummy`: +1 if party is in government for this window's year, −1 otherwise
|
|
(quarterly windows: strip suffix to get year, e.g., `2016-Q3` → `2016`)
|
|
3. Compute Pearson r for X against each reference dimension:
|
|
- `r_lr_x = pearsonr(party_x, ref_lr)[0]`
|
|
- `r_pc_x = pearsonr(party_x, ref_pc)[0]`
|
|
- `r_co_x = pearsonr(party_x, coalition_dummy)[0]`
|
|
4. Assign label and interpretation using priority order (first threshold that fires wins):
|
|
- `|r_lr_x| ≥ 0.65` → label = `"Links–Rechts"`, flip sign if r < 0
|
|
- `|r_co_x| ≥ 0.65` → label = `"Coalitie–Oppositie"`
|
|
- `|r_pc_x| ≥ 0.65` → label = `"Progressief–Conservatief"`, flip sign if r < 0
|
|
- fallback → label = `"Stempatroon As 1"`
|
|
5. Quality score for this window's X-axis: `max(|r_lr_x|, |r_pc_x|, |r_co_x|)`
|
|
6. Repeat steps 3–5 for Y-axis using `party_y`.
|
|
7. After processing all windows, pick global X label = modal label across annual windows only
|
|
(quarterly windows participate in quality tracking but not in the modal vote, to avoid
|
|
over-weighting). The `current_parliament` window is excluded from modal voting entirely and
|
|
from the coalition dimension (no year to look up); it still gets x_quality and x_interpretation
|
|
based on the left_right and progressive correlations.
|
|
|
|
**Interpretation strings (Dutch):**
|
|
|
|
| label | interpretation |
|
|
|---|---|
|
|
| Links–Rechts | "De horizontale as weerspiegelt de klassieke links-rechts tegenstelling." |
|
|
| Coalitie–Oppositie | "De horizontale as weerspiegelt stemgedrag van coalitie- versus oppositiepartijen (r={r:.2f}). Links-rechts is minder dominant dit jaar." |
|
|
| Progressief–Conservatief | "De horizontale as weerspiegelt de progressief-conservatieve tegenstelling." |
|
|
| Stempatroon As 1 | "De horizontale as weerspiegelt een empirisch stempatroon zonder duidelijke ideologische richting." |
|
|
|
|
Y-axis interpretations follow the same template with "verticale" instead of "horizontale".
|
|
|
|
**Return value:** the input `axes` dict with four new keys added:
|
|
`x_label`, `y_label`, `x_quality` (dict), `y_quality` (dict), `x_interpretation` (dict),
|
|
`y_interpretation` (dict).
|
|
|
|
### 3. `explorer.py` changes
|
|
|
|
**`load_positions()`** — after calling `compute_2d_axes`, call `classify_axes` and store the enriched
|
|
axes dict. If `classify_axes` raises for any reason, catch and log; use the original axes dict.
|
|
|
|
**Compass renderer** — two changes only:
|
|
1. Replace hardcoded `"Links–Rechts"` / `"Progressief–Conservatief"` axis title strings with
|
|
`axes.get("x_label", "Links–Rechts")` and `axes.get("y_label", "Progressief–Conservatief")`.
|
|
2. Add a caption below the compass for the selected year. Show when either axis quality < 0.65:
|
|
> *"In 2016 weerspiegelt de horizontale as coalitie–oppositie stemgedrag (r=0.71)."*
|
|
|
|
Source: `axes["x_interpretation"].get(selected_window_id, "")`.
|
|
|
|
No other UI changes. The compass layout is untouched.
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
load_positions(db_path, window_size)
|
|
→ compute_2d_axes(...) [unchanged; returns positions_by_window, axes]
|
|
→ classify_axes( [new]
|
|
positions_by_window,
|
|
axes,
|
|
db_path=db_path
|
|
)
|
|
reads: data/party_ideologies.csv (module-level cache)
|
|
reads: data/coalition_membership.csv (module-level cache)
|
|
uses: positions_by_window already in memory
|
|
writes: new keys into axes dict (no mutation of positions)
|
|
→ return positions_by_window, axes_enriched
|
|
|
|
compass render (existing function)
|
|
→ axes["x_label"] [was hardcoded "Links–Rechts"]
|
|
→ axes["y_label"] [was hardcoded "Progressief–Conservatief"]
|
|
→ axes["x_interpretation"][window_id] [new caption]
|
|
```
|
|
|
|
No DB writes. No new DB queries. Pure in-memory correlation over data that's already loaded.
|
|
CSV reads are ~microseconds and cached after first call.
|
|
|
|
## Error Handling
|
|
|
|
| Failure | Behaviour |
|
|
|---|---|
|
|
| `data/party_ideologies.csv` missing | Log WARNING, return `axes` unchanged (current labels preserved) |
|
|
| `data/coalition_membership.csv` missing | Log WARNING, coalition dimension skipped; other correlations still computed |
|
|
| Party in positions but not in CSV | Skip silently; log once at DEBUG per session |
|
|
| Window has fewer than 5 overlapping parties | Skip classification for that window; use fallback label |
|
|
| All correlations < 0.65 | Fallback label is always safe; no crash |
|
|
| Any unexpected exception in `classify_axes` | Caller (`load_positions`) catches, logs, returns original `axes` dict |
|
|
|
|
## Testing Strategy
|
|
|
|
Three new tests added to `tests/test_political_compass.py`:
|
|
|
|
**`test_axis_label_left_right`**
|
|
Construct synthetic per-party positions where X values correlate strongly (r > 0.8) with the left_right
|
|
column of a minimal inline CSV. Assert that `classify_axes` returns `x_label == "Links–Rechts"` and
|
|
`x_quality[window] > 0.65`.
|
|
|
|
**`test_axis_label_coalition_dominant`**
|
|
Construct synthetic positions where X values match coalition membership pattern but NOT left-right.
|
|
(E.g., coalition parties [VVD, PvdA] cluster at x=+1, opposition [PVV, SP] at x=−1, which is
|
|
historically coherent for 2016.) Assert `x_label == "Coalitie–Oppositie"` and that the interpretation
|
|
string contains "coalitie".
|
|
|
|
**`test_axis_classifier_missing_csv`**
|
|
Call `classify_axes` with a db_path pointing to a nonexistent directory so CSV loading fails. Assert
|
|
that the function returns the axes dict unchanged and does not raise.
|
|
|
|
All three tests use monkeypatching to inject CSV content as in-memory StringIO, following the existing
|
|
pattern in `tests/test_political_compass.py` of patching module-level imports.
|
|
|
|
## Open Questions
|
|
|
|
None.
|
|
|