4.2 KiB

Raw Permalink Blame History

date	topic
2026-04-13	topic-derived-svd-axis-labels

Topic-Derived SVD Axis Labels

Problem Frame

The current SVD axis labels in SVD_THEMES (config.py) describe which parties land where, not what policy dimension the axis captures. This produces misleading labels:

Axis 1: labeled "Links: PvdD, GL-PvdA" but PvdD and D66 vote the same way on the defining motions (Israel, rent, antipersonnel mines, gas extraction). D66 is known as centrist, not left. The label reflects party positions, not the actual policy divide.
The negative pole is named after parties that coincidentally vote together, not parties that define the axis.

Users want to understand what policy dimension each axis represents. A good label should be topic-derived from the motions that define each axis.

Requirements

Label Derivation

R1 Labels are derived from the content of the motions that define each axis, not from party positions.
R2 Use 50 motions per component (top 25 positive + top 25 negative by absolute loading) to capture the full topic breadth, not just the top 10 (which can show a misleadingly narrow slice).
R3 Derive the label using TF-IDF keyword extraction on motion titles (Dutch stopwords removed). Use the top 3-5 most distinctive keywords to form a short label.
R4 Also consider policy_area field to validate or supplement the keyword-derived label.
R5 Labels should be reviewed manually before being applied to SVD_THEMES. The script outputs suggestions; human validates before committing.
R6 For each component, the output includes:
- Suggested short label (≤60 chars)
- Top 10 representative motions (5 pos + 5 neg pole)
- Top 10 TF-IDF keywords
- Dominant policy_area
- Current SVD_THEMES label for reference

Tooling

R7 Create a new script scripts/derive_svd_labels.py that generates a review report (markdown) with label suggestions per component.

R8 The report is generated by running:

uv run python3 scripts/derive_svd_labels.py --db data/motions.db --window current_parliament

R9 After review, the validated labels are written to analysis/config.py (updating SVD_THEMES).

Output Report Format

For each component (1-10), the review report includes:

Suggested label
TF-IDF keyword list
Dominant policy area
Top 5 positive-pole motion titles
Top 5 negative-pole motion titles
Current label for comparison

Success Criteria

Each axis label reflects the actual policy topics that define that axis
Labels are consistent and interpretable (e.g., "Buitenlandbeleid & Klimaat" not "Links vs Rechts")
PvdD and D66 scoring on axis 1 makes sense given the derived label
The review report makes it easy for a human to validate or correct labels

Scope Boundaries

In scope: Label derivation for axis 1-10, review workflow, updating config
Out of scope: Automatically applying labels without review, changing the SVD computation, modifying the UI
Not changing: The positive_pole / negative_pole fields in SVD_THEMES (those describe party coalitions, not topics — acceptable as-is)

Key Decisions

TF-IDF over LLM: TF-IDF is deterministic, fast, and sufficient for keyword extraction. No LLM dependency. Reviewer still validates output.
Static labels in config: After review, labels go into SVD_THEMES in config.py. This keeps the current architecture (no runtime derivation).
Large motion sample (≥50): 10 motions per component is too few — axis 1's 10 motions show a mix of Israel, rent, mines, gas that looks incoherent. ≥50 gives a clearer picture of what the axis truly captures.

Dependencies / Assumptions

Motion titles in motions table are in Dutch and sufficiently descriptive
policy_area field has meaningful coverage
svd_vectors table contains all motion loadings for the window

Outstanding Questions

Resolve Before Planning

(none)

Deferred to Planning

Tooling approach: Use parallel subagents (one per axis) to analyze 50 motions each and derive labels, rather than a single sequential script. Each subagent produces a suggested label independently.

Next Steps

→ /ce:plan for structured implementation planning

4.2 KiB Raw Permalink Blame History