You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/docs/solutions/logic-errors/svd-component-labels-mismat...

96 lines
4.1 KiB

---
title: SVD component labels incorrect due to semantic vs voting pattern mismatch
date: 2026-04-04
category: docs/solutions/logic-errors/
module: Stemwijzer Data Analysis
problem_type: logic_error
component: explorer
symptoms:
- Component 1 label "Sociale zekerheid vs economische liberalisering" did not match voting patterns
- Report analysis showed different party alignment than label suggested
- SVD components captured voting patterns but labels described semantic content
root_cause: logic_error
resolution_type: code_fix
severity: high
tags: [svd, voting-analysis, component-labels, logic-error]
---
# SVD component labels incorrect due to semantic vs voting pattern mismatch
## Problem
The SVD (Singular Value Decomposition) component labels in `explorer.py` were based on semantic analysis of motion titles, but the SVD actually captures HOW parties vote, not WHAT topics are discussed. This resulted in misleading component labels that did not match actual voting patterns.
## Symptoms
- Component 1 was labeled "Sociale zekerheid vs economische liberalisering" but actually captured coalition vs opposition voting
- Analysis report showed different party groupings than the labels suggested
- Report generation used incorrect slice (`scored[:30]`) instead of positive/negative party separation
## What Didn't Work
- Semantic analysis of motion titles to determine component labels
- Assuming that topics discussed in motions matched how parties voted on them
- Report generation logic was inconsistent with JSON output logic
## Solution
### 1. Report Generation Bug Fix (commit bfe37c6)
Fixed the report generation to use positive/negative party lists correctly instead of `scored[:30]`:
```python
# Before (incorrect)
scored[:30]
# After (correct)
positive_parties = [p for p, s in scored if s > 0]
negative_parties = [p for p, s in scored if s < 0]
```
### 2. Component 1 Label Fix (commit f7fc908)
Changed from semantic-based to voting-pattern-based label:
```python
# Before (incorrect)
"label": "Sociale zekerheid vs economische liberalisering"
# After (correct)
"label": "Rechts kabinetsbeleid vs links oppositiebeleid"
```
Root cause: Component 1 captures 9 coalition parties voting together vs 6 opposition parties voting together.
### 3. Components 2, 4, 5, 6 Label Updates (commit 92c3c0e)
- **Component 2**: "PVV/FVD-populisme versus mainstream-partijen" — Only PVV and FVD vote positively
- **Component 4**: "Mainstreampartijen versus FVD/DENK-oppositie" — Only FVD and DENK vote negatively
- **Component 5**: "Christelijk-sociaal en gemeenschapswaarden versus progressieve individuele rechten"
- **Component 6**: "Migratie en cultuur versus klimaat en progressieve inclusie"
### 4. Exclusive Motion Assignment (commit 33edb33)
Each motion now appears on only one component (highest absolute loading):
```python
# Each motion assigned to component with highest absolute loading
# Backward compatible with --no-exclusive flag
```
## Why This Works
**Critical Insight**: SVD captures voting patterns, not semantic content. When labeling SVD components:
- Look at which parties vote positively vs negatively
- Don't assume semantics match voting patterns
- Coalition vs opposition is a strong voting dimension in parliamentary data
- Components may include motions from seemingly unrelated topics if parties vote the same way
The fix works because it aligns labels with actual voting data:
- Labels now describe the voting behavior of parties
- Positive/negative poles show which parties vote which way
- Explanations reference specific motions that illustrate the pattern
## Prevention
1. **Always verify SVD labels against voting data** — Before finalizing labels, check which parties score positively and negatively on each component
2. **Test label-party alignment** — Add a test that verifies component labels match the party groupings in the data
3. **Document the semantic vs voting distinction** — Make this a known Gotcha in the codebase for future developers
## Related Issues
- Analysis: `thoughts/explorer/top_svd_top_motions_report.md`
- JSON generator: `scripts/generate_svd_json.py`
- Labels source: `explorer.py:434-611` (SVD_THEMES dictionary)