# Axis Classification Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add `analysis/axis_classifier.py` that dynamically labels the political compass axes by correlating per-party PCA positions against a party ideology reference CSV, replacing hardcoded "Links–Rechts" / "Progressief–Conservatief" labels. **Architecture:** A new pure module `classify_axes()` reads two static CSVs (`data/party_ideologies.csv`, `data/coalition_membership.csv`) and enriches the `axes` dict returned by `compute_2d_axes`. `load_positions()` in `explorer.py` calls it after PCA; the compass and trajectories renderers use the resulting `x_label`/`y_label` keys instead of hardcoded strings. CSVs are committed to git and baked into the Docker image. **Tech Stack:** Python stdlib (`pathlib`, `csv`-via-manual-parse), NumPy (already present), Streamlit (already present). No new runtime dependencies. --- ## File Map | Action | Path | Responsibility | |---|---|---| | Create | `data/party_ideologies.csv` | Party left_right + progressive reference scores | | Create | `data/coalition_membership.csv` | Per-year coalition party membership | | Create | `analysis/axis_classifier.py` | `classify_axes()` — correlate positions against reference | | Modify | `tests/test_political_compass.py` | Add 3 tests for classifier behaviour | | Modify | `explorer.py:194-209` | Call `classify_axes` inside `load_positions` | | Modify | `explorer.py:927-928` | Dynamic labels in party-level scatter | | Modify | `explorer.py:946` | Dynamic labels in MP-level scatter | | Modify | `explorer.py:1050` | Accept axis_def from `load_positions` in trajectories tab | | Modify | `explorer.py:1120-1121` | Dynamic titles in trajectories chart | --- ### Task 1: Write the three failing tests **Files:** - Modify: `tests/test_political_compass.py` - [ ] **Step 1: Open `tests/test_political_compass.py` and append the three test functions below** Add this block at the end of the file: ```python # --------------------------------------------------------------------------- # Tests for analysis.axis_classifier # --------------------------------------------------------------------------- import importlib def _fresh_classifier(monkeypatch): """Import axis_classifier with cleared module-level caches.""" import analysis.axis_classifier as _cls monkeypatch.setattr(_cls, "_ideology_cache", None) monkeypatch.setattr(_cls, "_coalition_cache", None) return _cls def test_axis_label_left_right(tmp_path, monkeypatch): """Positions that closely correlate with left_right scores → label 'Links–Rechts'.""" _cls = _fresh_classifier(monkeypatch) (tmp_path / "party_ideologies.csv").write_text( "party,left_right,progressive\n" "VVD,0.65,0.10\n" "PvdA,-0.70,0.75\n" "SP,-0.90,0.50\n" "PVV,0.90,-0.50\n" "D66,-0.10,0.85\n" "CDA,0.25,-0.45\n" ) (tmp_path / "coalition_membership.csv").write_text("window_id,party\n") # X values are the party's left_right scores — perfect correlation positions_by_window = { "2022": { "VVD": (0.65, 0.10), "PvdA": (-0.70, 0.20), "SP": (-0.90, 0.30), "PVV": (0.90, -0.10), "D66": (-0.10, 0.40), "CDA": (0.25, -0.20), } } axes = {"x_axis": None, "y_axis": None, "method": "pca"} result = _cls.classify_axes( positions_by_window, axes, str(tmp_path / "motions.db") ) assert result["x_label"] == "Links\u2013Rechts" assert result["x_quality"]["2022"] >= 0.65 def test_axis_label_coalition_dominant(tmp_path, monkeypatch): """Positions that match coalition pattern but NOT left-right → 'Coalitie–Oppositie'.""" _cls = _fresh_classifier(monkeypatch) (tmp_path / "party_ideologies.csv").write_text( "party,left_right,progressive\n" "VVD,0.65,0.10\n" "PvdA,-0.70,0.75\n" "SP,-0.90,0.50\n" "PVV,0.90,-0.50\n" "D66,-0.10,0.85\n" "CDA,0.25,-0.45\n" ) # 2016: Rutte II coalition = VVD + PvdA (tmp_path / "coalition_membership.csv").write_text( "window_id,party\n" "2016,VVD\n" "2016,PvdA\n" ) # Coalition parties (VVD + PvdA) at x ≈ +1, opposition at x ≈ -1. # VVD (right) and PvdA (left) are both near +1 → low left_right correlation # but high coalition correlation. positions_by_window = { "2016": { "VVD": (0.95, 0.10), "PvdA": (0.90, 0.20), "SP": (-0.85, 0.30), "PVV": (-0.95, -0.10), "D66": (-0.80, 0.40), "CDA": (-0.75, -0.20), } } axes = {"x_axis": None, "y_axis": None, "method": "pca"} result = _cls.classify_axes( positions_by_window, axes, str(tmp_path / "motions.db") ) assert result["x_label"] == "Coalitie\u2013Oppositie" assert "coalitie" in result["x_interpretation"]["2016"].lower() def test_axis_classifier_missing_csv(tmp_path, monkeypatch): """Missing party_ideologies.csv → returns axes dict unchanged, no exception.""" _cls = _fresh_classifier(monkeypatch) # No CSVs written — directory exists but files do not positions_by_window = {"2022": {"VVD": (1.0, 0.5), "PvdA": (-1.0, 0.3)}} axes = {"x_axis": None, "y_axis": None, "method": "pca"} result = _cls.classify_axes( positions_by_window, axes, str(tmp_path / "motions.db") ) # Must not crash and must return the original axes dict unchanged assert result is axes assert "x_label" not in result ``` - [ ] **Step 2: Run the tests to confirm they fail (module doesn't exist yet)** ```bash uv run pytest tests/test_political_compass.py::test_axis_label_left_right tests/test_political_compass.py::test_axis_label_coalition_dominant tests/test_political_compass.py::test_axis_classifier_missing_csv -v ``` Expected: 3 failures like `ModuleNotFoundError: No module named 'analysis.axis_classifier'` --- ### Task 2: Create the reference data files **Files:** - Create: `data/party_ideologies.csv` - Create: `data/coalition_membership.csv` - [ ] **Step 1: Create `data/party_ideologies.csv`** ``` party,left_right,progressive VVD,0.65,0.10 PvdA,-0.70,0.75 SP,-0.90,0.50 CDA,0.25,-0.45 D66,-0.10,0.85 GroenLinks,-0.70,0.90 GL,-0.70,0.90 GroenLinks-PvdA,-0.70,0.82 ChristenUnie,0.10,-0.55 SGP,0.35,-0.95 PVV,0.90,-0.50 DENK,-0.40,0.55 50Plus,-0.05,-0.10 FVD,0.90,-0.75 PvdD,-0.60,0.85 Volt,-0.20,0.80 JA21,0.70,-0.30 BBB,0.50,-0.35 NSC,0.20,-0.20 Nieuw Sociaal Contract,0.20,-0.20 BVNL,0.85,-0.55 Bij1,-0.90,0.90 ``` - [ ] **Step 2: Create `data/coalition_membership.csv`** ``` window_id,party 2012,VVD 2012,PvdA 2013,VVD 2013,PvdA 2014,VVD 2014,PvdA 2015,VVD 2015,PvdA 2016,VVD 2016,PvdA 2017,VVD 2017,CDA 2017,D66 2017,ChristenUnie 2018,VVD 2018,CDA 2018,D66 2018,ChristenUnie 2019,VVD 2019,CDA 2019,D66 2019,ChristenUnie 2020,VVD 2020,CDA 2020,D66 2020,ChristenUnie 2021,VVD 2021,CDA 2021,D66 2021,ChristenUnie 2022,VVD 2022,D66 2022,CDA 2022,ChristenUnie 2023,VVD 2023,D66 2023,CDA 2023,ChristenUnie 2024,PVV 2024,VVD 2024,NSC 2024,BBB 2025,PVV 2025,VVD 2025,NSC 2025,BBB 2026,PVV 2026,VVD 2026,NSC 2026,BBB ``` - [ ] **Step 3: Verify the files are NOT excluded by .gitignore** ```bash git check-ignore -v data/party_ideologies.csv data/coalition_membership.csv ``` Expected: no output (files are not ignored — `.gitignore` only excludes `data/*.db`, `data/*.bak`, `data/*.json`) --- ### Task 3: Implement `analysis/axis_classifier.py` **Files:** - Create: `analysis/axis_classifier.py` - [ ] **Step 1: Create the file with this full implementation** ```python """Axis classifier: correlate per-party PCA positions against ideology reference data to assign honest, dynamic labels to political compass axes. Public API: classify_axes(positions_by_window, axes, db_path) -> dict """ import logging from collections import Counter from pathlib import Path from typing import Dict, List, Optional, Tuple import numpy as np _logger = logging.getLogger(__name__) # Module-level caches — loaded once per process lifetime. _ideology_cache: Optional[Dict[str, Dict[str, float]]] = None _coalition_cache: Optional[Dict[str, set]] = None # Correlation threshold above which we consider an axis "explained" by a dimension. _THRESHOLD = 0.65 _LABELS = { "lr": "Links\u2013Rechts", "co": "Coalitie\u2013Oppositie", "pc": "Progressief\u2013Conservatief", "fallback_x": "Stempatroon As 1", "fallback_y": "Stempatroon As 2", } _INTERPRETATION_TEMPLATES = { "lr": "De {orientation} as weerspiegelt de klassieke links-rechts tegenstelling.", "co": ( "De {orientation} as weerspiegelt stemgedrag van coalitie- versus " "oppositiepartijen (r={r:.2f}). Links-rechts is minder dominant dit jaar." ), "pc": "De {orientation} as weerspiegelt de progressief-conservatieve tegenstelling.", "fallback": ( "De {orientation} as weerspiegelt een empirisch stempatroon " "zonder duidelijke ideologische richting." ), } def _load_ideology(csv_path: Path) -> Dict[str, Dict[str, float]]: """Load party ideology scores from CSV. Returns {party_name: {"left_right": float, "progressive": float}}. Returns {} on any error (caller should treat empty as 'skip classification'). """ global _ideology_cache if _ideology_cache is not None: return _ideology_cache result: Dict[str, Dict[str, float]] = {} try: with open(csv_path, encoding="utf-8") as fh: lines = fh.read().splitlines() header = [h.strip() for h in lines[0].split(",")] lr_idx = header.index("left_right") pc_idx = header.index("progressive") for line in lines[1:]: if not line.strip(): continue parts = [p.strip() for p in line.split(",")] if len(parts) <= max(lr_idx, pc_idx): continue result[parts[0]] = { "left_right": float(parts[lr_idx]), "progressive": float(parts[pc_idx]), } except FileNotFoundError: _logger.warning("party_ideologies.csv not found at %s — axis labels will be generic", csv_path) return {} except Exception as exc: _logger.warning("Failed to load party_ideologies.csv: %s", exc) return {} _ideology_cache = result return result def _load_coalition(csv_path: Path) -> Dict[str, set]: """Load coalition membership from CSV. Returns {window_id: set_of_party_names}. Returns {} on any error (coalition dimension will be skipped). """ global _coalition_cache if _coalition_cache is not None: return _coalition_cache result: Dict[str, set] = {} try: with open(csv_path, encoding="utf-8") as fh: lines = fh.read().splitlines() for line in lines[1:]: if not line.strip(): continue parts = [p.strip() for p in line.split(",")] if len(parts) < 2: continue wid, party = parts[0], parts[1] result.setdefault(wid, set()).add(party) except FileNotFoundError: _logger.warning( "coalition_membership.csv not found at %s — coalition axis detection disabled", csv_path ) return {} except Exception as exc: _logger.warning("Failed to load coalition_membership.csv: %s", exc) return {} _coalition_cache = result return result def _window_year(window_id: str) -> Optional[str]: """Extract year string from window_id. Returns None for 'current_parliament'. '2016' → '2016', '2016-Q3' → '2016'. """ if window_id == "current_parliament": return None return window_id.split("-")[0] def _pearsonr(x: List[float], y: List[float]) -> float: """Pearson r; returns 0.0 for degenerate input (< 3 points or zero variance).""" if len(x) < 3: return 0.0 xa = np.array(x, dtype=float) ya = np.array(y, dtype=float) if xa.std() < 1e-12 or ya.std() < 1e-12: return 0.0 return float(np.corrcoef(xa, ya)[0, 1]) def _assign_label( r_lr: float, r_co: float, r_pc: float, axis: str, ) -> Tuple[str, str, float]: """Assign label, interpretation and quality score for one axis. Priority: left-right > coalition > progressive > fallback. Returns (label, interpretation_string, quality_score). """ orientation = "horizontale" if axis == "x" else "verticale" fallback_label = _LABELS["fallback_x"] if axis == "x" else _LABELS["fallback_y"] quality = max(abs(r_lr), abs(r_co), abs(r_pc)) if abs(r_lr) >= _THRESHOLD: return ( _LABELS["lr"], _INTERPRETATION_TEMPLATES["lr"].format(orientation=orientation), quality, ) if abs(r_co) >= _THRESHOLD: return ( _LABELS["co"], _INTERPRETATION_TEMPLATES["co"].format(orientation=orientation, r=r_co), quality, ) if abs(r_pc) >= _THRESHOLD: return ( _LABELS["pc"], _INTERPRETATION_TEMPLATES["pc"].format(orientation=orientation), quality, ) return ( fallback_label, _INTERPRETATION_TEMPLATES["fallback"].format(orientation=orientation), quality, ) def classify_axes( positions_by_window: Dict[str, Dict[str, Tuple[float, float]]], axes: dict, db_path: str, ) -> dict: """Classify compass axes by correlating per-party positions against ideology reference data. Enriches ``axes`` with: x_label, y_label — global label (modal across annual windows) x_quality, y_quality — {window_id: float} max |r| for each window x_interpretation — {window_id: str} Dutch explanation per window y_interpretation — {window_id: str} Dutch explanation per window Returns the original ``axes`` dict unchanged if reference data is unavailable. """ data_dir = Path(db_path).parent ideology = _load_ideology(data_dir / "party_ideologies.csv") if not ideology: return axes # no reference data — preserve existing behaviour coalition = _load_coalition(data_dir / "coalition_membership.csv") x_quality: Dict[str, float] = {} y_quality: Dict[str, float] = {} x_interpretation: Dict[str, str] = {} y_interpretation: Dict[str, str] = {} annual_x_labels: List[str] = [] annual_y_labels: List[str] = [] for wid, pos_dict in positions_by_window.items(): year = _window_year(wid) is_current = wid == "current_parliament" is_annual = not is_current and "-" not in wid # e.g. "2016" not "2016-Q3" # Only use parties present in both the positions and the ideology reference. parties = [p for p in pos_dict if p in ideology] if len(parties) < 5: _logger.debug( "Skipping axis classification for %s: only %d reference parties (need 5)", wid, len(parties), ) continue party_x = [pos_dict[p][0] for p in parties] party_y = [pos_dict[p][1] for p in parties] ref_lr = [ideology[p]["left_right"] for p in parties] ref_pc = [ideology[p]["progressive"] for p in parties] # Coalition dummy: +1 if in government that year, -1 otherwise. # current_parliament and windows with no coalition data use a neutral vector. if year and coalition and year in coalition: gov_set = coalition[year] ref_co = [1.0 if p in gov_set else -1.0 for p in parties] else: ref_co = [0.0] * len(parties) # neutral — will never exceed threshold r_lr_x = _pearsonr(party_x, ref_lr) r_co_x = _pearsonr(party_x, ref_co) r_pc_x = _pearsonr(party_x, ref_pc) x_lbl, x_int, x_q = _assign_label(r_lr_x, r_co_x, r_pc_x, "x") r_lr_y = _pearsonr(party_y, ref_lr) r_co_y = _pearsonr(party_y, ref_co) r_pc_y = _pearsonr(party_y, ref_pc) y_lbl, y_int, y_q = _assign_label(r_lr_y, r_co_y, r_pc_y, "y") x_quality[wid] = x_q y_quality[wid] = y_q x_interpretation[wid] = x_int y_interpretation[wid] = y_int # Only annual windows vote on the global label (not quarterly, not current_parliament). if is_annual: annual_x_labels.append(x_lbl) annual_y_labels.append(y_lbl) def _modal(labels: List[str], fallback: str) -> str: if not labels: return fallback return Counter(labels).most_common(1)[0][0] enriched = dict(axes) enriched["x_label"] = _modal(annual_x_labels, "Links\u2013Rechts") enriched["y_label"] = _modal(annual_y_labels, "Progressief\u2013Conservatief") enriched["x_quality"] = x_quality enriched["y_quality"] = y_quality enriched["x_interpretation"] = x_interpretation enriched["y_interpretation"] = y_interpretation return enriched ``` - [ ] **Step 2: Run the three new tests** ```bash uv run pytest tests/test_political_compass.py::test_axis_label_left_right tests/test_political_compass.py::test_axis_label_coalition_dominant tests/test_political_compass.py::test_axis_classifier_missing_csv -v ``` Expected: all 3 PASS - [ ] **Step 3: Run the full test suite to confirm no regressions** ```bash uv run pytest tests/test_political_compass.py -v ``` Expected: all tests PASS (5 original + 3 new = 8 total) - [ ] **Step 4: Commit** ```bash git add data/party_ideologies.csv data/coalition_membership.csv analysis/axis_classifier.py tests/test_political_compass.py git commit -m "feat: add axis classifier with party ideology reference data classify_axes() correlates per-party PCA positions against party_ideologies.csv to assign honest dynamic labels (Links-Rechts, Coalitie-Oppositie, etc.) instead of always assuming the first PCA axis is left-right." ``` --- ### Task 4: Wire classify_axes into load_positions **Files:** - Modify: `explorer.py:194-209` - [ ] **Step 1: In `load_positions()`, add the classify_axes call after `compute_2d_axes` returns** Find this block (lines 194–209): ```python positions_by_window, axis_def = compute_2d_axes( db_path, window_ids=all_available, method="pca", pca_residual=True, normalize_vectors=True, ) # Filter displayed windows by window_size AFTER PCA computation. if window_size == "annual": ``` Replace with: ```python positions_by_window, axis_def = compute_2d_axes( db_path, window_ids=all_available, method="pca", pca_residual=True, normalize_vectors=True, ) try: from analysis.axis_classifier import classify_axes axis_def = classify_axes(positions_by_window, axis_def, db_path) except Exception: import logging logging.getLogger(__name__).exception("classify_axes failed; using generic axis labels") # Filter displayed windows by window_size AFTER PCA computation. if window_size == "annual": ``` - [ ] **Step 2: Run the full test suite** ```bash uv run pytest tests/test_political_compass.py -v ``` Expected: all 8 tests PASS --- ### Task 5: Use dynamic labels in the compass scatter plots **Files:** - Modify: `explorer.py:927-928` and `explorer.py:946` The `axis_def` variable is already in scope in `build_compass_tab` (it's returned by `load_positions` at line 817). - [ ] **Step 1: Add helper variables just before the first `px.scatter` call** Find the line `title=f"Politiek Kompas — {_window_label(window_idx)} (partijen)",` (around line 925) and locate the function `build_compass_tab`. Near the top of that function (just after `axis_def` becomes available at line 817), find a convenient spot before the first scatter plot is created. Look for the block that starts building the figure (the `if level == "Partijen":` branch). Add the two helper variables right before that `if`: ```python _x_label = axis_def.get("x_label", "Links\u2013Rechts") _y_label = axis_def.get("y_label", "Progressief\u2013Conservatief") ``` - [ ] **Step 2: Replace the hardcoded label in the party-level scatter (around line 927–928)** Find: ```python labels={ "x": "Links \u2190 \u2192 Rechts", "y": "Progressief / Conservatief", "n": "Kamerleden", }, ``` Replace with: ```python labels={ "x": _x_label, "y": _y_label, "n": "Kamerleden", }, ``` - [ ] **Step 3: Replace the hardcoded label in the MP-level scatter (around line 946)** Find: ```python labels={"x": "Links \u2190 \u2192 Rechts", "y": "Progressief / Conservatief"}, ``` Replace with: ```python labels={"x": _x_label, "y": _y_label}, ``` - [ ] **Step 4: Add the per-year interpretation caption after the chart is rendered** Find (around line 955–959): ```python _add_y_direction_annotations(fig) with col1: st.plotly_chart(fig, use_container_width=True) ``` Replace with: ```python _add_y_direction_annotations(fig) with col1: st.plotly_chart(fig, use_container_width=True) _x_interp = axis_def.get("x_interpretation", {}).get(window_idx, "") _y_interp = axis_def.get("y_interpretation", {}).get(window_idx, "") if _x_interp and axis_def.get("x_quality", {}).get(window_idx, 1.0) < _THRESHOLD: st.caption(_x_interp) if _y_interp and axis_def.get("y_quality", {}).get(window_idx, 1.0) < _THRESHOLD: st.caption(_y_interp) ``` Also add the constant `_THRESHOLD = 0.65` near the top of `explorer.py`, with the other module-level constants (after the imports). Search for an existing `_SPARSE_YEARS` or similar constant to find the right location. If no suitable spot exists, add it right before `build_compass_tab`. - [ ] **Step 5: Run the full test suite** ```bash uv run pytest tests/test_political_compass.py -v ``` Expected: all 8 tests PASS --- ### Task 6: Update the trajectories chart labels **Files:** - Modify: `explorer.py:1050` and `explorer.py:1120-1121` - [ ] **Step 1: In `build_trajectories_tab`, capture `axis_def` from `load_positions`** Find (around line 1050): ```python positions_by_window, _ = load_positions(db_path, window_size) ``` Replace with: ```python positions_by_window, axis_def = load_positions(db_path, window_size) ``` - [ ] **Step 2: Replace hardcoded axis titles in the trajectories chart (around line 1120–1121)** Find: ```python xaxis_title="Links \u2190 \u2192 Rechts", yaxis_title="Progressief / Conservatief", ``` Replace with: ```python xaxis_title=axis_def.get("x_label", "Links\u2013Rechts"), yaxis_title=axis_def.get("y_label", "Progressief\u2013Conservatief"), ``` - [ ] **Step 3: Run the full test suite one final time** ```bash uv run pytest tests/test_political_compass.py -v ``` Expected: all 8 tests PASS - [ ] **Step 4: Final commit** ```bash git add explorer.py git commit -m "feat: use dynamic axis labels in compass and trajectories UI Replace hardcoded 'Links-Rechts' / 'Progressief-Conservatief' axis labels with values from classify_axes(). Add per-year interpretation caption when axis quality score is below the 0.65 correlation threshold." ``` --- ## Self-Review ### Spec coverage check | Spec requirement | Covered by | |---|---| | `analysis/axis_classifier.py` with `classify_axes()` | Task 3 | | CSV paths derived from `Path(db_path).parent` | Task 3 (line in implementation) | | Pearson r for left_right, progressive, coalition dimensions | Task 3 (`_pearsonr`, `_assign_label`) | | Priority: lr > coalition > progressive > fallback | Task 3 (`_assign_label`) | | Global label = modal across annual windows | Task 3 (`_modal`, `is_annual` flag) | | `current_parliament` excluded from modal vote | Task 3 (`is_current`, `is_annual` check) | | Quarterly windows excluded from modal vote | Task 3 (`is_annual` = no `-` in wid) | | Backward-compatible when CSVs missing | Task 3 (`_load_ideology` returns `{}`; `classify_axes` returns original `axes`) | | `data/party_ideologies.csv` committed to git | Task 2 | | `data/coalition_membership.csv` committed to git | Task 2 | | `load_positions` calls `classify_axes` | Task 4 | | Dynamic x/y labels in compass scatter | Task 5 Steps 2–3 | | Per-year caption when quality < 0.65 | Task 5 Step 4 | | Dynamic labels in trajectories chart | Task 6 | | 3 tests: left_right, coalition, missing CSV | Task 1 | All spec requirements covered. No gaps. ### Placeholder scan No TBDs, TODOs, or vague steps present. ### Type consistency - `classify_axes` returns `dict` with keys `x_label` (str), `y_label` (str), `x_quality` (dict), `y_quality` (dict), `x_interpretation` (dict), `y_interpretation` (dict) — consistent across Tasks 3, 4, 5, 6. - `_THRESHOLD` is used in Task 5 Step 4; the constant is introduced in that same step. - `axis_def.get("x_label", "Links–Rechts")` matches the key name `"x_label"` set in Task 3.