9.0 KiB

Raw Blame History

Mapping Dutch Democracy: Building a Political Compass from 25,000+ Parliamentary Votes

What if you could take every motion voted on in the Dutch Parliament over the past decade and automatically plot parties and MPs on a political map — with zero manual labeling?

That's exactly what this project does. Here's how we built it, what surprised us, and what it revealed about Dutch political dynamics.

The Starting Point: Open Data, Hidden Structure

The Dutch Parliament publishes every vote — every motie, every amendement, every besluit — in an open OData API. We're talking over 25,500 motions spanning 2016 to 2026, each with a record of how every party (and in many cases every individual MP) voted: voor (for), tegen (against), onthouden (abstained), or afwezig (absent).

This is an extraordinary dataset. But in raw form it's just a table of votes. The interesting question is: can we extract structure — left vs. right, progressive vs. conservative, governing vs. opposition — purely from the pattern of who votes with whom?

The answer is yes, and the method is surprisingly elegant.

Step 1: Turning Votes into Geometry

Each motion is a snapshot of political alignment. For each motion, we know which parties voted together and which voted apart. If PvdA and GroenLinks almost always vote the same way, that tells us something. If PVV and CDA frequently diverge, that tells us something too.

We represent this with Singular Value Decomposition (SVD) on the party-vote matrix:

Rows: parties (VVD, PVV, D66, CDA, PvdA, GroenLinks, SP, CU, SGP, FvD, BBB, ...)
Columns: motions
Values: vote encoded as +1 (voor), -1 (tegen), 0 (absent/abstain)

SVD finds the dominant axes of variation — the directions along which parties disagree most strongly. The first dimension almost always corresponds to a left-right axis. The second dimension typically captures something like a libertarian-authoritarian or progressive-traditionalist axis.

We run this per quarterly window (2019-Q1, 2019-Q2, ..., 2024-Q4) so we can track how positions shift over time at fine resolution.

The Result: A 2D Political Compass

The output is coordinates for every party in 2D space — computed purely from voting behavior, with no labels or assumptions from us. When you plot it, recognizable structure emerges immediately:

Left bloc (PvdA, GroenLinks, SP) cluster tightly together
Right-liberal (VVD, D66) sit in a distinct quadrant
Religious right (SGP, CU) form their own coherent group
Populist right (PVV, FvD in later years) occupy a distant extreme
BBB (Farmer's party, 2022 onwards) drops into an interesting position between PVV and CDA

The political axis emerges from the math — not our intuitions.

Step 2: What Each Motion Is Actually About

Voting patterns tell us who agrees, but not why. For that, we add text embeddings — dense vector representations of each motion's title and description using a language model.

This lets us do something powerful: if a new motion comes in about nitrogen emissions, we can find the 20 most similar past motions (by meaning, not just keywords). If a motion uses identical party-line voting as another motion from 2022, the text embedding can confirm they're genuinely related — or reveal that the voting pattern is coincidental (parties split on unrelated issues for similar structural reasons).

We compute these using OpenAI-compatible embeddings via OpenRouter, processing 25,640 motions in batches of 200.

Step 3: Fused Embeddings — The Best of Both Worlds

SVD gives us the political-structural signal: how does this motion split the chamber? Text embeddings give us semantic signal: what is this motion about?

We concatenate both into a fused vector per motion per window:

fused = [svd_dims (50)] + [text_dims (2560)] = 2610 dimensions

This fused representation powers the similarity search. Two motions are considered "close" if they're both about a similar topic and they produce a similar political split. This filters out spurious matches — two motions might both be controversial (splitting 50/50) but about completely unrelated things.

The Numbers: What We're Working With

After the full pipeline run:

Year	Motions
2016	132
2017	30
2018	100
2019	3,374
2020	4,228
2021	4,289
2022	4,116
2023	621
2024	3,968
2025	3,715
2026	948

The 2022 spike is striking — over 4,000 motions in a single year. This was the year the Rutte IV coalition took office amid intense debates on energy prices, housing, the war in Ukraine, and the ongoing nitrogen crisis.

Our similarity cache now holds 627,272 precomputed pairs (top 20 neighbors per motion per window), making similarity lookup instant at query time.

Interesting Findings

The 2022 Polarization Surge

The 2022 cohort dominates the dataset. Looking at the SVD positions for that year, the distance between the governing coalition (VVD, D66, CDA, CU) and the opposition (PVV, SP, FvD) is near its maximum. The nitrogen crisis and energy policy debates forced unusually sharp coalition discipline.

BBB's Geometric Arrival

When BBB (BoerBurgerBeweging) entered parliament in 2023 with a historic 16 seats, their SVD position placed them between PVV and CDA — exactly as expected from their policy profile: agrarian-nationalist populism with Catholic-provincial roots. The model found this without being told.

The Strange Case of "Verworpen."

Motions that are rejected without debate are recorded with the title "Verworpen." (Rejected.). There are hundreds of these. Because they share a single 9-character title, their text embeddings are identical — meaning every "Verworpen." has cosine similarity 1.0 to every other "Verworpen." This is technically correct (they are textually identical) but semantically meaningless. The similarity cache contains these spurious pairs, which the UI layer needs to filter out.

It's a good reminder that data quality surprises emerge at scale.

Party Cohesion as a Signal

A subtle finding: party cohesion (how often all members of a party vote the same way) varies enormously. SGP and CU have near-perfect cohesion — they vote as a bloc on almost everything. PvdA/GroenLinks (post-merger) has similarly high cohesion. But in earlier years (2019-2020), before the merger, GroenLinks occasionally splits on specific issues around security policy.

VVD shows the most internal variation — governing parties develop fissures.

The Pipeline Architecture

The system is built around a single DuckDB database and a modular Python pipeline:

API (Tweede Kamer OData) 
  → download_past_year.py 
  → motions table (25,500+ rows)
  
motions
  → extract_mp_votes.py → mp_votes table (200k rows)
  → text_pipeline.py   → embeddings table (25,640 rows, via OpenRouter)
  → svd_pipeline.py    → svd_vectors table (50,779 rows, quarterly windows)
  
svd_vectors + embeddings
  → fusion.py          → fused_embeddings table (35,872 rows)
  
fused_embeddings
  → similarity/compute.py → similarity_cache table (627k rows, top-20 per window)

Everything runs locally. The only external call is to the OpenRouter API for text embeddings. The similarity computation (627k pairs) is pure NumPy — load vectors, normalize, matrix multiply, take top-k. For 4,000 motions in a quarter, that's a 4000×4000 cosine similarity matrix computed in seconds.

What's Next

The similarity cache and political compass open up several directions:

Motion explorer: Given a motion you care about, find the 20 most politically and semantically similar motions from across the decade. Trace how a policy debate evolved from 2019 to 2025.

Party trajectory plots: Animate party positions on the 2D compass year by year. Watch D66 drift, watch PVV consolidate, watch the new parties arrive and find their position.

Cross-party coalition predictor: Given a new motion's text and expected vote split, predict which parties will support it based on past patterns.

The "controversy index": We already compute 1 - winning_margin as a controversy score. The most controversial motions (close votes, high stakes topics) tell a story about where Dutch politics is genuinely undecided vs. where it's performing conflict for the cameras.

Reproducibility

The full pipeline is open and runs on a single machine with no cloud infrastructure:

# Download historical data
python scripts/download_past_year.py --start-date 2016-01-01 --end-date 2026-01-01

# Run full pipeline (extract votes, compute SVD, embed text, fuse, build similarity cache)
python -m pipeline.run_pipeline --db-path data/motions.db \
    --start-date 2016-01-01 --end-date 2026-01-01 \
    --window-size annual --text-batch-size 200

The DB grows to ~3.6GB for the full dataset (mostly embeddings and vote records). Everything else — the SVD, fusion, similarity cache — fits comfortably in memory during computation.

Democracy is more legible than it looks.

9.0 KiB Raw Blame History