diff --git a/thoughts/shared/designs/2026-04-16-glpvda-merger-svd-analysis-design.md b/thoughts/shared/designs/2026-04-16-glpvda-merger-svd-analysis-design.md new file mode 100644 index 0000000..7ec8971 --- /dev/null +++ b/thoughts/shared/designs/2026-04-16-glpvda-merger-svd-analysis-design.md @@ -0,0 +1,113 @@ +--- +date: 2026-04-16 +topic: "GroenLinks-PvdA Merger Dynamics in SVD Space" +status: validated +--- + +## Problem Statement + +We need concrete, data-driven findings about the GroenLinks-PvdA merger for a blog post. Four questions: +1. How similar were GL and PvdA in SVD space before the merger? +2. How cohesive is the merged party compared to others? +3. When did GL and PvdA converge in SVD space? +4. Which parties shifted the most, and how do GL/PvdA compare? + +## Constraints + +- Investigation-only: query the database, report findings, no code changes +- Database: `data/motions.db` (DuckDB), SVD vectors in `svd_vectors` table +- Party affiliation: use `mp_votes.party` for historical labels (not `mp_metadata` which only tracks current party) +- SVD vectors are at MP level (`entity_type='mp'`); party centroids must be computed from MP vectors +- Cross-window scale differences require normalization (distances as fraction of avg inter-party distance) + +## Approach + +**Method**: Compute party centroids from MP-level SVD vectors, grouped by `mp_votes.party` for historical party labels. Normalize all distances as fractions of the average inter-party distance in each window to make cross-window comparisons meaningful. + +**Data source**: `svd_vectors` table (MP vectors per window) joined with `mp_votes` (historical party labels per vote, majority party per MP per year). + +**Key discovery**: `mp_metadata` only tracks current party — pre-merger GL and PvdA MPs who merged are now labeled "GroenLinks-PvdA". We must use `mp_votes.party` for historical accuracy. + +## Architecture + +N/A — this is a data investigation, not a system design. + +## Components + +### Finding 1: GL-PvdA Pre-Merger Similarity + +GL and PvdA were already remarkably close in SVD space well before the merger: + +| Year | GL↔PvdA Distance | As % of Avg Inter-Party Distance | Nearest Other Party | Distance to Nearest | +|------|------------------|----------------------------------|---------------------|---------------------| +| 2019 | 2.10 | 10.5% | PvdD | 9.6 | +| 2020 | 2.23 | 5.0% | CU | 28.7 | +| 2021 | 1.46 | 4.4% | FVD | 11.6 | +| 2022 | 1.16 | 2.8% | FVD | 6.5 | + +The nearest non-PvdA party to GL was always 5-10x further away than PvdA itself. They converged over time — from 10.5% of average inter-party distance in 2019 down to 2.8% in 2022. + +### Finding 2: Post-Merger Cohesion + +| Year | GL-PvdA Spread | Avg Other Spread | Ratio | Cohesion Rank | +|------|---------------|-------------------|-------|---------------| +| 2023 | 1.50 | 19.95 | 0.08 | #1 most cohesive | +| 2024 | 14.05 | 18.47 | 0.76 | Mid-pack | +| 2025 | 28.09 | 18.09 | 1.55 | Below average | +| Current | 43.30 | 28.05 | 1.54 | Below average | + +The merged party started as the most cohesive party in parliament (2023), but by 2025 its internal spread is 55% above average — the merger created a party that's internally more diverse than typical Dutch parties. + +### Finding 3: Merger Convergence Timeline + +| Window | GL↔PvdA Distance | Normalized Ratio | +|--------|------------------|------------------| +| 2019-Q3 | 0.98 | 25.5% | +| 2020-Q1 | 1.38 | 18.6% | +| 2021-Q1 | 1.58 | 19.3% | +| 2022-Q3 | 0.86 | 9.4% | +| 2023-Q1 | 0.58 | 7.1% | +| **2023-Q3** | **0.37** | **4.5%** | +| 2023-Q4 | 0.46 | 5.5% | + +By Q3 2023 — just before the formal merger — GL and PvdA centroids were only 4.5% of the average inter-party distance apart. Essentially indistinguishable in voting pattern space. + +### Finding 4: Large Positional Shifts + +GL and PvdA were the most stable parties in parliament (normalized drift per year): + +| Period | GL Drift | PvdA Drift | VVD Drift | D66 Drift | PVV Drift | +|--------|----------|------------|-----------|-----------|-----------| +| 2019→2020 | 14.5% | 16.6% | 140.2% | 145.6% | 121.9% | +| 2020→2021 | 21.8% | 25.8% | 115.8% | 82.2% | 207.3% | +| 2021→2022 | 11.6% | 10.8% | 70.8% | 91.2% | 51.9% | +| 2022→2023 | 54.5% | 23.3% | 109.7% | 177.3% | 222.1% | + +While VVD and D66 moved 70-177% per year, GL and PvdA drifted only 10-25%. The merger partners were anchored in place while the rest of the landscape shifted. + +## Data Flow + +1. Query `svd_vectors` for MP vectors per window +2. Join with `mp_votes` to determine each MP's majority party in that year +3. Compute party centroids as mean of member vectors +4. Compute pairwise distances and normalize by average inter-party distance +5. Track convergence timeline using quarterly windows + +## Error Handling + +- Windows with insufficient MPs (<3 per party) are excluded from centroid calculations +- The `mp_votes.party` column uses multiple label variants ("GroenLinks", "GL", "GroenLinks-PvdA") — normalized in queries +- The 2023 transition year has mixed labels (some GL, some PvdA, some GL-PvdA) — handled by majority-vote assignment per MP + +## Testing Strategy + +N/A — data investigation. Key validation checks: +- Cross-reference MP counts with known parliament compositions +- Verify that GL + PvdA MP counts match expected seat counts per year +- Confirm that convergence timeline aligns with known political events (merger announcement Oct 2023) + +## Open Questions + +- Should we compute cosine similarity instead of Euclidean distance for cross-window normalization? +- The 2025 and current_parliament windows show very different absolute scales — should we normalize vectors before computing distances? +- The few remaining "GL" (8) and "PvdA" (5) labeled MPs in 2025 may be artifacts — should they be included in the GL-PvdA group? \ No newline at end of file