You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.2 KiB
4.2 KiB
| date | topic |
|---|---|
| 2026-04-13 | topic-derived-svd-axis-labels |
Topic-Derived SVD Axis Labels
Problem Frame
The current SVD axis labels in SVD_THEMES (config.py) describe which parties land where, not what policy dimension the axis captures. This produces misleading labels:
- Axis 1: labeled "Links: PvdD, GL-PvdA" but PvdD and D66 vote the same way on the defining motions (Israel, rent, antipersonnel mines, gas extraction). D66 is known as centrist, not left. The label reflects party positions, not the actual policy divide.
- The negative pole is named after parties that coincidentally vote together, not parties that define the axis.
Users want to understand what policy dimension each axis represents. A good label should be topic-derived from the motions that define each axis.
Requirements
Label Derivation
- R1 Labels are derived from the content of the motions that define each axis, not from party positions.
- R2 Use 50 motions per component (top 25 positive + top 25 negative by absolute loading) to capture the full topic breadth, not just the top 10 (which can show a misleadingly narrow slice).
- R3 Derive the label using TF-IDF keyword extraction on motion titles (Dutch stopwords removed). Use the top 3-5 most distinctive keywords to form a short label.
- R4 Also consider
policy_areafield to validate or supplement the keyword-derived label. - R5 Labels should be reviewed manually before being applied to
SVD_THEMES. The script outputs suggestions; human validates before committing. - R6 For each component, the output includes:
- Suggested short label (≤60 chars)
- Top 10 representative motions (5 pos + 5 neg pole)
- Top 10 TF-IDF keywords
- Dominant
policy_area - Current SVD_THEMES label for reference
Tooling
- R7 Create a new script
scripts/derive_svd_labels.pythat generates a review report (markdown) with label suggestions per component. - R8 The report is generated by running:
uv run python3 scripts/derive_svd_labels.py --db data/motions.db --window current_parliament - R9 After review, the validated labels are written to
analysis/config.py(updatingSVD_THEMES).
Output Report Format
For each component (1-10), the review report includes:
- Suggested label
- TF-IDF keyword list
- Dominant policy area
- Top 5 positive-pole motion titles
- Top 5 negative-pole motion titles
- Current label for comparison
Success Criteria
- Each axis label reflects the actual policy topics that define that axis
- Labels are consistent and interpretable (e.g., "Buitenlandbeleid & Klimaat" not "Links vs Rechts")
- PvdD and D66 scoring on axis 1 makes sense given the derived label
- The review report makes it easy for a human to validate or correct labels
Scope Boundaries
- In scope: Label derivation for axis 1-10, review workflow, updating config
- Out of scope: Automatically applying labels without review, changing the SVD computation, modifying the UI
- Not changing: The
positive_pole/negative_polefields in SVD_THEMES (those describe party coalitions, not topics — acceptable as-is)
Key Decisions
- TF-IDF over LLM: TF-IDF is deterministic, fast, and sufficient for keyword extraction. No LLM dependency. Reviewer still validates output.
- Static labels in config: After review, labels go into
SVD_THEMESin config.py. This keeps the current architecture (no runtime derivation). - Large motion sample (≥50): 10 motions per component is too few — axis 1's 10 motions show a mix of Israel, rent, mines, gas that looks incoherent. ≥50 gives a clearer picture of what the axis truly captures.
Dependencies / Assumptions
- Motion titles in
motionstable are in Dutch and sufficiently descriptive policy_areafield has meaningful coveragesvd_vectorstable contains all motion loadings for the window
Outstanding Questions
Resolve Before Planning
(none)
Deferred to Planning
- Tooling approach: Use parallel subagents (one per axis) to analyze 50 motions each and derive labels, rather than a single sequential script. Each subagent produces a suggested label independently.
Next Steps
→ /ce:plan for structured implementation planning