+Mapping Dutch Democracy: Building a Political Compass from 29,000+ Parliamentary Votes
The raw data preserves the distinction carefully. From 2019 through mid-2023, the svd_vectors table lists GroenLinks and PvdA as separate entries per window. From late 2023 onwards — when the merger formally took effect in parliament — a single GroenLinks-PvdA entity appears. The pipeline tracks this faithfully: you can literally watch two separate points on the political compass drift together and then merge into one.
What's striking is how early the convergence is visible. By 2021 — two full years before the merger announcement — GroenLinks and PvdA coordinates in the SVD space are nearly overlapping. At the individual MP level, there was occasional divergence on defense and security votes (GroenLinks MPs pulling slightly away from the PvdA centroid), but at the party level they were practically indistinguishable.
-This created an interesting pipeline challenge: the party normalization step has a mapping that folds both names into GroenLinks-PvdA across the entire dataset. For the post-merger period that's correct; for the pre-merger period it's a simplification that hides the convergence story. The raw vectors still capture it — you just have to know to look.
Finding 2: When Left and Right Unite Against the Center
-During the Rutte IV cabinet (2022–2023), a recurring pattern emerged: PVV, FvD, and JA21 (right-wing) would vote with SP, GroenLinks-PvdA, PvdD, DENK, and Volt (left-wing) against the governing parties VVD, D66, CDA, and ChristenUnie. This isn't a one-off — it happened on dozens of motions.
- -The topics tell the story:
--
-
- Disability care bureaucracy — motions to reduce administrative burden in disability care. The populist right and the progressive left both opposed the coalition's market-oriented approach. -
- Respite care for intensive caregivers — same coalition of radical left and radical right, opposing centrist fiscal restraint. -
- Anti-fraud budget retention — the coalition wanted to maintain the anti-fraud apparatus (think: the toeslagenaffaire aftermath); both flanks pushed back. -
- Education funding — motions to increase fundamental education budgets. VVD and D66 voted against; PVV and SP voted together. -
- Regional infrastructure — train stations, Eindhoven connectivity, regional investment. Left+right voted for; coalition voted against. -
This is the classic "horseshoe" pattern in political science — the extremes converging against the center — but it's remarkable to see it so clearly in the voting geometry. It's not ideological agreement between left and right; it's a shared opposition to the governing consensus.
- -Finding 3: BBB's Geometric Arrival
-When BBB (BoerBurgerBeweging) entered parliament after the 2023 provincial elections, their SVD position placed them between PVV and CDA — consistent with their policy profile: agrarian-nationalist populism with Catholic-provincial roots. New parties don't get to pick their geometric location; the voting record places them. That BBB landed exactly where you'd expect is a good validity check.
-What the geometry also shows: BBB started close to PVV on the nationalist axis, but drifted toward the CDA cluster over their first year in parliament — visible as a curved trajectory rather than a fixed point.
- -Finding 4: The Closest Votes in a Decade
- -The controversy score (1 − winning_margin) reveals the knife-edge votes. In the current fragmented parliament, the tightest split is a perfect 8–8 party-line tie — decided by the chamber chair's casting vote. These happened on:
What if you could take every motion voted on in the Dutch Parliament over the past decade and automatically plot parties and MPs on a political map — with zero manual labeling?
+That’s exactly what this project does. Here’s how I built it, what I had to solve along the way, and what it revealed about Dutch political dynamics.
++ +
Step 1: Turning Votes into Geometry
+Each motion is a snapshot of political alignment. For each motion, we know which MPs voted together and which voted apart. If every PvdA and GroenLinks MP votes the same way almost every time, that tells us something. If PVV and CDA MPs diverge consistently, that tells us something too.
+I represent this with Singular Value Decomposition (SVD) on the MP × motion matrix:
-
-
- Family reunification for AMV status holders (Boomsma motion, 2025) — immigration policy at its most contested -
- Nuclear weapons and NATO (Dobbe motion, 2025) — whether to push for nuclear disarmament within the alliance -
- Long COVID research funding (Kostic motion, 2025) — healthcare commitments that split parties along unexpected lines -
- Cormorant population management (Kostic motion, 2025) — agricultural vs. ecological interests in a literal bird-counting exercise +
- Rows: individual MPs (and party actors for collective votes) +
- Columns: motions +
- Values: +1 (voor), -1 (tegen), 0 (absent/abstain)
The narrowest non-tie votes are razor-thin too: Wilders' asylum emergency stop motion lost by the slimmest margin (5 parties for, 16 tied — effectively blocked), while Marijnissen's motion against private equity in GP practices nearly flipped the other way (16 for, 5 tied). On a different day, a different MP showing up, Dutch immigration and healthcare policy could have shifted.
- -More broadly, over 15,000 motions had winning margins below 55% — these are the genuinely contested decisions, not the rubber stamps. At the other extreme, about 3,700 motions passed with 95%+ support: the uncontroversial consensus items that rarely make headlines.
- -The Pipeline Architecture
+SVD finds the dominant axes of variation — the directions along which the chamber disagrees most. The first component almost always corresponds to a left-right axis. The second typically captures something like progressive-traditionalist or libertarian-authoritarian. The key point: the axes emerge from the math, not from any labeling on my part.
+I request 50 SVD dimensions per window — but the actual dimensionality is constrained by min(n_MPs, n_motions) - 1. Sparse windows (early years, partial quarters) produce fewer meaningful dimensions. The pipeline handles this gracefully, storing whatever k_used is for each window so downstream fusion always works with the actual vector length.
Making Windows Comparable: Procrustes Alignment
+Running SVD independently per window creates a subtle problem: SVD axes are arbitrarily oriented. The “left-right” axis from 2020-Q3 and the “left-right” axis from 2021-Q1 might point in completely different directions — even if the underlying politics barely changed. You can’t just stack the coordinates and call it a trajectory.
+The fix is Procrustes alignment: given two sets of party/MP positions across consecutive windows, find the rotation matrix R that best maps one onto the other (minimizing the Frobenius norm of the difference), using MPs who appear in both windows as anchors:
+R = argmin_R ||A - B @ R||_F, subject to R'R = I
+This is solved cleanly via SVD of the cross-covariance matrix (a nice piece of mathematical symmetry — SVD to build the space, SVD to align it). The result: a continuous track for every party from 2019 to 2026, where position changes reflect genuine political movement rather than axis flips.
+High Procrustes disparity between consecutive windows — where alignment is poor even with the best rotation — is itself a signal: it suggests a structural political shift, not just individual drift.
++
Step 2: What Each Motion Is Actually About
+Voting patterns tell us who agrees, but not why. For that, I add text embeddings — dense vector representations of each motion’s content using a language model.
+I use qwen/qwen3-embedding-4b via OpenRouter — a 4-billion parameter multilingual model that produces 2560-dimensional vectors with strong Dutch-language support. For each motion, I embed the richest text available: full parliamentary body text when we have it (94% of the 29,570 motions after an enrichment pass against the Tweede Kamer API), falling back to the summary description or title otherwise.
This lets us do something powerful: find motions that are genuinely similar in topic, not just in voting pattern. Two motions about nitrogen policy from 2020 and 2023 might have very different vote splits (different coalitions, different political moment) but near-identical text embeddings. That’s a meaningful connection.
++
Step 3: Fused Embeddings — The Best of Both Worlds
+SVD gives the political-structural signal: how does this motion split the chamber? Text embeddings give the semantic signal: what is this motion about?
+I concatenate both into a fused vector per motion per window:
+fused = [svd_dims (typically 50)] + [text_dims (2560)] = typically 2610 dimensions
+The actual dimension varies slightly because SVD dimensionality adapts to window density — the code stores svd_dims and text_dims per row so nothing downstream has to assume a fixed size.
This fused representation powers the similarity search. Two motions are “close” only if they’re about a similar topic and they produce a similar political split. This filters out spurious matches — two motions might both be controversial (close 50/50 votes) but on completely unrelated things, and the text component separates them.
++
The Numbers: What We’re Working With
+After the full pipeline run:
+| Year | +Motions | +
|---|---|
| 2016 | +162 | +
| 2017 | +126 | +
| 2018 | +124 | +
| 2019 | +3,374 | +
| 2020 | +4,223 | +
| 2021 | +4,283 | +
| 2022 | +4,115 | +
| 2023 | +3,272 | +
| 2024 | +3,965 | +
| 2025 | +3,712 | +
| 2026 | +2,214 | +
| Total | +29,570 | +
The 2022 spike is striking — over 4,000 motions in a single year. This was the year the Rutte IV coalition took office amid intense debates on energy prices, housing, the war in Ukraine, and the ongoing nitrogen crisis. 2023 is similarly dense at 3,272 motions, culminating in the November election that brought PVV to its historic first-place finish.
+Early years (2016–2018) use annual windows because the data is too sparse for meaningful quarterly SVD. From 2019 onwards, everything runs quarterly, giving us 41 windows in total.
+The similarity cache holds 409,938 precomputed pairs — top 10 neighbors per motion per window — making lookup instant at query time.
++
Interesting Findings
+The 2022–2023 Polarization Surge
+2022 and 2023 together account for more than a quarter of all motions in the dataset. In the SVD positions for 2022, the distance between the governing coalition (VVD, D66, CDA, CU) and the opposition (PVV, SP, FvD) is near its maximum. The nitrogen crisis and energy policy debates forced unusually sharp coalition discipline — which shows up geometrically as well-separated clusters.
+2023 continued the intensity, and the Procrustes-aligned trajectory shows the party positions in 2023-Q4 and 2024-Q1 shifting noticeably as the new coalition began to form.
+BBB’s Geometric Arrival
+When BBB (BoerBurgerBeweging) entered parliament in 2023 with a historic 16 seats, their SVD position placed them between PVV and CDA — exactly matching their policy profile: agrarian-nationalist populism with Catholic-provincial roots. The model found this without being told. That’s a good sanity check that the geometry is capturing something real.
+The Strange Case of “Verworpen.”
+Motions rejected without debate are recorded with the title “Verworpen.” (Rejected.). There are hundreds of these. Because they share a 9-character title, their text embeddings are identical — cosine similarity 1.0 to every other “Verworpen.” in the cache. Technically correct; semantically meaningless. The UI layer filters these out.
+It’s a reminder that data quality surprises emerge at scale. I found three or four similar pathologies (motions withdrawn mid-session, duplicate API records) that required explicit handling.
+Party Cohesion as a Signal
+Party cohesion — how often all MPs of a party vote identically — varies enormously. SGP and CU are near-perfect blocs. PvdA/GroenLinks (post-2023 merger) is similarly tight. VVD shows the most internal variation, which tracks with what you’d expect from a governing party managing coalition discipline across conflicting wings.
+In earlier years (2019–2020), before the GroenLinks-PvdA merger, GroenLinks occasionally splits on security and defense policy — visible in the SVD as individual MP positions diverging from the party centroid.
++
The Pipeline Architecture
Single DuckDB database, modular Python pipeline, no cloud infrastructure:
-API (Tweede Kamer OData)
- → download_past_year.py → motions table (28,304 rows)
-
+API (Tweede Kamer OData)
+ → download_past_year.py
+ → motions table (29,570 rows)
+
motions
- → extract_mp_votes.py → mp_votes table (508,765 rows)
- → sync_motion_content.py → body_text enrichment (~94% coverage)
- → svd_pipeline.py → svd_vectors table (73,165 rows, 41 windows)
-
-svd_vectors
- → similarity/compute.py → similarity_cache (top-10 per window)
-The similarity computation is pure NumPy: load all SVD vectors for a window, pad to uniform length, L2-normalize, compute the full cosine similarity matrix via a single matrix multiply, then extract top-k neighbors. For a 4,000-motion quarter, that's a 4000×4000 matrix operation — fast enough that batching isn't needed.
-The database sits at ~18 GB on disk — the full parliamentary text for 26,000+ motions accounts for most of that.
-
-What the Axes Actually Mean
-
-One of the trickiest problems was labeling the SVD axes. The first component reliably captures left-right economics. But components 3 through 10? The mathematical procedure is sound — SVD finds the directions of maximum variance — but the meaning of each axis has to be derived from the actual motions that load heavily on it.
-
-I solved this by extracting the top 50 motions per component (by absolute loading score), then analyzing their content. Some clear patterns emerged:
-
-
-- Component 1: Fiscal-economic policy vs. social welfare and international rights — the classic left-right split.
-- Component 2: Nationalist vs. multilateralist orientation — PVV/FvD on one side, Volt/GroenLinks-PvdA on the other.
-- Component 3: Welfare state vs. defense spending — flip of the usual axis (with SP/PvdD on the pro-welfare side, VVD/SGP on the pro-defense side).
-
-
-
-How much do the first two axes actually capture? In a single-window SVD (current parliament), PC1 explains ~29% of the variance and PC2 explains ~11.5% — together accounting for ~41% of all voting variation. PC3 adds another 8.6%, but from there it drops off sharply: PC4 is under 9%, and components 5–8 each contribute 3–6%. The classic "scree plot" elbow is clear: the first two dimensions carry the signal, the rest is real but diminishing. When looking across all time windows with Procrustes alignment, the picture flattens considerably — PC1 and PC2 each explain ~14.6% and ~13.1% respectively — because aligning 41 different windows distributes variance more evenly. The multi-window perspective is more conservative, but the message is the same: Dutch politics is largely two-dimensional.
-
-
-Scree plot across 41 aligned quarterly windows. PC1 = 14.6%, PC2 = 13.1%.
-
-
-What's Next
-
-Motion explorer: Given a motion, retrieve the 10 most politically similar ones from across the decade. Trace how a policy debate evolved — who championed it, how the coalitions shifted.
-
-Party trajectory animation: Procrustes-aligned positions, animated year by year. Watch GroenLinks-PvdA's pre-merger convergence, watch PVV consolidate its flank, watch new parties arrive and find their geometric home.
-
-Cross-party coalition patterns: Which topics produce unusual coalition configurations — motions where the normal left-right split breaks down and unexpected alliances form.
-
-Cabinet crisis detection: Track coalition cohesion over time. When do coalition parties start voting against each other? The Procrustes disparity between consecutive windows is itself a signal of structural political shifts.
-
-Reproducibility
-# Download historical data
-python scripts/download_past_year.py --start-date 2016-01-01 --end-date 2026-01-01
-
-# Run full pipeline (SVD, similarity cache)
-python -m pipeline.run_pipeline --db-path data/motions.db \
- --start-date 2016-01-01 --end-date 2026-01-01 \
- --window-size quarterly --text-batch-size 200
-
-# Enrich with full motion body text
-python scripts/sync_motion_content.py --db-path data/motions.db
-All computation — SVD, similarity — runs locally on a single machine. No cloud services, no GPU required.
-
+ → extract_mp_votes.py → mp_votes table (531,869 rows)
+ → sync_motion_content.py → body_text enrichment (~94%)
+ → text_pipeline.py → embeddings table (28,680 rows, qwen3-embedding-4b via OpenRouter)
+ → svd_pipeline.py → svd_vectors table (73,172 rows, 41 windows)
+
+svd_vectors + embeddings
+ → fusion.py → fused_embeddings table (41,422 rows)
+
+fused_embeddings
+ → similarity/compute.py → similarity_cache table (409,938 rows, top-10 per window)
+The similarity computation is pure NumPy: load all fused vectors for a window, pad to uniform length, L2-normalize, compute the full N×N cosine similarity matrix via a single matrix multiply (normalized @ normalized.T), then extract top-k neighbors per row with np.argpartition. For a 4,000-motion quarter, that’s a 4000×4000 matrix operation — fast enough that it’s not worth batching.
The database sits at 18 GB on disk — up from ~3 GB before body text enrichment. The full parliamentary text for 28,000+ motions accounts for most of that growth.
++
What I Built On Top
+The pipeline above is the foundation. Here’s what it now powers:
+Overton Window analysis: Using the SVD compass and vote records, I tested whether the Dutch Overton window shifted after PVV’s November 2023 election victory. The answer: it widened, but through right-wing moderation rather than centrist conversion. Centrist support for right-wing motions rose from 25% to 51%, while centrists actually moved left on the SVD compass. The full analysis covers 3,030 classified right-wing motions, 2D extremity scoring, quarterly trajectories, and mechanism classification. Read the full report →
+2D extremity scoring: Every motion in the database has been scored by an LLM on two independent dimensions: stylistic extremity (rhetorical hostility) and material impact (policy consequence). They’re only moderately correlated (r = 0.43), which matters: right-wing motions post-2024 became milder on both dimensions, not just in tone.
+Streamlit Explorer: An interactive dashboard where you can browse the SVD compass, trace party trajectories over time, explore centrist support trends, and browse individual motions with their extremity scores and similarity matches. The same data and methods that drive the analysis reports power the live exploration interface.
++
Reproducibility
+# Download historical data
+python scripts/download_past_year.py --start-date 2016-01-01 --end-date 2026-01-01
+
+# Run full pipeline (SVD, text embeddings, fusion, similarity cache)
+python -m pipeline.run_pipeline --db-path data/motions.db \
+ --start-date 2016-01-01 --end-date 2026-01-01 \
+ --window-size quarterly --text-batch-size 200
+
+# Enrich with full motion body text
+python scripts/sync_motion_content.py --db-path data/motions.dbThe DB grows to ~18 GB for the full dataset including body text. All computation — SVD, fusion, similarity — runs locally on a single machine.
Democracy is more legible than it looks.
- - +