You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/docs/solutions/logic-errors/svd-component-labels-mismat...

4.1 KiB

title date category module problem_type component symptoms root_cause resolution_type severity tags
SVD component labels incorrect due to semantic vs voting pattern mismatch 2026-04-04 docs/solutions/logic-errors/ Stemwijzer Data Analysis logic_error explorer [Component 1 label "Sociale zekerheid vs economische liberalisering" did not match voting patterns Report analysis showed different party alignment than label suggested SVD components captured voting patterns but labels described semantic content] logic_error code_fix high [svd voting-analysis component-labels logic-error]

SVD component labels incorrect due to semantic vs voting pattern mismatch

Problem

The SVD (Singular Value Decomposition) component labels in explorer.py were based on semantic analysis of motion titles, but the SVD actually captures HOW parties vote, not WHAT topics are discussed. This resulted in misleading component labels that did not match actual voting patterns.

Symptoms

  • Component 1 was labeled "Sociale zekerheid vs economische liberalisering" but actually captured coalition vs opposition voting
  • Analysis report showed different party groupings than the labels suggested
  • Report generation used incorrect slice (scored[:30]) instead of positive/negative party separation

What Didn't Work

  • Semantic analysis of motion titles to determine component labels
  • Assuming that topics discussed in motions matched how parties voted on them
  • Report generation logic was inconsistent with JSON output logic

Solution

1. Report Generation Bug Fix (commit bfe37c6)

Fixed the report generation to use positive/negative party lists correctly instead of scored[:30]:

# Before (incorrect)
scored[:30]

# After (correct)
positive_parties = [p for p, s in scored if s > 0]
negative_parties = [p for p, s in scored if s < 0]

2. Component 1 Label Fix (commit f7fc908)

Changed from semantic-based to voting-pattern-based label:

# Before (incorrect)
"label": "Sociale zekerheid vs economische liberalisering"

# After (correct)  
"label": "Rechts kabinetsbeleid vs links oppositiebeleid"

Root cause: Component 1 captures 9 coalition parties voting together vs 6 opposition parties voting together.

3. Components 2, 4, 5, 6 Label Updates (commit 92c3c0e)

  • Component 2: "PVV/FVD-populisme versus mainstream-partijen" — Only PVV and FVD vote positively
  • Component 4: "Mainstreampartijen versus FVD/DENK-oppositie" — Only FVD and DENK vote negatively
  • Component 5: "Christelijk-sociaal en gemeenschapswaarden versus progressieve individuele rechten"
  • Component 6: "Migratie en cultuur versus klimaat en progressieve inclusie"

4. Exclusive Motion Assignment (commit 33edb33)

Each motion now appears on only one component (highest absolute loading):

# Each motion assigned to component with highest absolute loading
# Backward compatible with --no-exclusive flag

Why This Works

Critical Insight: SVD captures voting patterns, not semantic content. When labeling SVD components:

  • Look at which parties vote positively vs negatively
  • Don't assume semantics match voting patterns
  • Coalition vs opposition is a strong voting dimension in parliamentary data
  • Components may include motions from seemingly unrelated topics if parties vote the same way

The fix works because it aligns labels with actual voting data:

  • Labels now describe the voting behavior of parties
  • Positive/negative poles show which parties vote which way
  • Explanations reference specific motions that illustrate the pattern

Prevention

  1. Always verify SVD labels against voting data — Before finalizing labels, check which parties score positively and negatively on each component
  2. Test label-party alignment — Add a test that verifies component labels match the party groupings in the data
  3. Document the semantic vs voting distinction — Make this a known Gotcha in the codebase for future developers
  • Analysis: thoughts/explorer/top_svd_top_motions_report.md
  • JSON generator: scripts/generate_svd_json.py
  • Labels source: analysis/config.py:67+ (SVD_THEMES dictionary)