You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/blog-post-political-compass.md

11 KiB

Mapping Dutch Democracy: Building a Political Compass from 28,000+ Parliamentary Votes

What if you could take every motion voted on in the Dutch Parliament over the past decade and automatically plot parties and MPs on a political map — with zero manual labeling?

That's exactly what this project does. Here's how I built it, what I had to solve along the way, and what it revealed about Dutch political dynamics.


The Starting Point: Open Data, Hidden Structure

The Dutch Parliament publishes every vote — every motie, every amendement, every besluit — in an open OData API. We're talking over 28,000 motions spanning 2016 to 2026, with a record of how every individual MP voted: voor (for), tegen (against), onthouden (abstained), or afwezig (absent). That's 506,000 individual vote records.

This is an extraordinary dataset. But in raw form it's just a table of votes. The interesting question is: can we extract structure — left vs. right, progressive vs. conservative, governing vs. opposition — purely from the pattern of who votes with whom?

The answer is yes, and the method is surprisingly elegant.


Step 1: Turning Votes into Geometry

Each motion is a snapshot of political alignment. For each motion, we know which MPs voted together and which voted apart. If every PvdA and GroenLinks MP votes the same way almost every time, that tells us something. If PVV and CDA MPs diverge consistently, that tells us something too.

I represent this with Singular Value Decomposition (SVD) on the MP × motion matrix:

  • Rows: individual MPs (and party actors for collective votes)
  • Columns: motions
  • Values: +1 (voor), -1 (tegen), 0 (absent/abstain)

SVD finds the dominant axes of variation — the directions along which the chamber disagrees most. The first component almost always corresponds to a left-right axis. The second typically captures something like progressive-traditionalist or libertarian-authoritarian. The key point: the axes emerge from the math, not from any labeling on my part.

I request 50 SVD dimensions per window — but the actual dimensionality is constrained by min(n_MPs, n_motions) - 1. Sparse windows (early years, partial quarters) produce fewer meaningful dimensions. The pipeline handles this gracefully, storing whatever k_used is for each window so downstream fusion always works with the actual vector length.

Making Windows Comparable: Procrustes Alignment

Running SVD independently per window creates a subtle problem: SVD axes are arbitrarily oriented. The "left-right" axis from 2020-Q3 and the "left-right" axis from 2021-Q1 might point in completely different directions — even if the underlying politics barely changed. You can't just stack the coordinates and call it a trajectory.

The fix is Procrustes alignment: given two sets of party/MP positions across consecutive windows, find the rotation matrix R that best maps one onto the other (minimizing the Frobenius norm of the difference), using MPs who appear in both windows as anchors:

R = argmin_R ||A - B @ R||_F,  subject to R'R = I

This is solved cleanly via SVD of the cross-covariance matrix (a nice piece of mathematical symmetry — SVD to build the space, SVD to align it). The result: a continuous track for every party from 2019 to 2026, where position changes reflect genuine political movement rather than axis flips.

High Procrustes disparity between consecutive windows — where alignment is poor even with the best rotation — is itself a signal: it suggests a structural political shift, not just individual drift.


Step 2: What Each Motion Is Actually About

Voting patterns tell us who agrees, but not why. For that, I add text embeddings — dense vector representations of each motion's content using a language model.

I use qwen/qwen3-embedding-4b via OpenRouter — a 4-billion parameter multilingual model that produces 2560-dimensional vectors with strong Dutch-language support. For each motion, I embed the richest text available: full parliamentary body text when we have it (94% of the 28,172 motions after an enrichment pass against the Tweede Kamer API), falling back to the summary description or title otherwise.

This lets us do something powerful: find motions that are genuinely similar in topic, not just in voting pattern. Two motions about nitrogen policy from 2020 and 2023 might have very different vote splits (different coalitions, different political moment) but near-identical text embeddings. That's a meaningful connection.


Step 3: Fused Embeddings — The Best of Both Worlds

SVD gives the political-structural signal: how does this motion split the chamber? Text embeddings give the semantic signal: what is this motion about?

I concatenate both into a fused vector per motion per window:

fused = [svd_dims (typically 50)] + [text_dims (2560)] = typically 2610 dimensions

The actual dimension varies slightly because SVD dimensionality adapts to window density — the code stores svd_dims and text_dims per row so nothing downstream has to assume a fixed size.

This fused representation powers the similarity search. Two motions are "close" only if they're about a similar topic and they produce a similar political split. This filters out spurious matches — two motions might both be controversial (close 50/50 votes) but on completely unrelated things, and the text component separates them.


The Numbers: What We're Working With

After the full pipeline run:

Year Motions
2016 132
2017 30
2018 100
2019 3,374
2020 4,228
2021 4,289
2022 4,116
2023 3,272
2024 3,968
2025 3,715
2026 948
Total 28,172

The 2022 spike is striking — over 4,000 motions in a single year. This was the year the Rutte IV coalition took office amid intense debates on energy prices, housing, the war in Ukraine, and the ongoing nitrogen crisis. 2023 is similarly dense at 3,272 motions, culminating in the November election that brought PVV to its historic first-place finish.

Early years (2016–2018) use annual windows because the data is too sparse for meaningful quarterly SVD. From 2019 onwards, everything runs quarterly, giving us 38 windows in total.

The similarity cache holds 405,216 precomputed pairs — top 10 neighbors per motion per window — making lookup instant at query time.


Interesting Findings

The 2022–2023 Polarization Surge

2022 and 2023 together account for more than a quarter of all motions in the dataset. In the SVD positions for 2022, the distance between the governing coalition (VVD, D66, CDA, CU) and the opposition (PVV, SP, FvD) is near its maximum. The nitrogen crisis and energy policy debates forced unusually sharp coalition discipline — which shows up geometrically as well-separated clusters.

2023 continued the intensity, and the Procrustes-aligned trajectory shows the party positions in 2023-Q4 and 2024-Q1 shifting noticeably as the new coalition began to form.

BBB's Geometric Arrival

When BBB (BoerBurgerBeweging) entered parliament in 2023 with a historic 16 seats, their SVD position placed them between PVV and CDA — exactly matching their policy profile: agrarian-nationalist populism with Catholic-provincial roots. The model found this without being told. That's a good sanity check that the geometry is capturing something real.

The Strange Case of "Verworpen."

Motions rejected without debate are recorded with the title "Verworpen." (Rejected.). There are hundreds of these. Because they share a 9-character title, their text embeddings are identical — cosine similarity 1.0 to every other "Verworpen." in the cache. Technically correct; semantically meaningless. The UI layer filters these out.

It's a reminder that data quality surprises emerge at scale. I found three or four similar pathologies (motions withdrawn mid-session, duplicate API records) that required explicit handling.

Party Cohesion as a Signal

Party cohesion — how often all MPs of a party vote identically — varies enormously. SGP and CU are near-perfect blocs. PvdA/GroenLinks (post-2023 merger) is similarly tight. VVD shows the most internal variation, which tracks with what you'd expect from a governing party managing coalition discipline across conflicting wings.

In earlier years (2019–2020), before the GroenLinks-PvdA merger, GroenLinks occasionally splits on security and defense policy — visible in the SVD as individual MP positions diverging from the party centroid.


The Pipeline Architecture

Single DuckDB database, modular Python pipeline, no cloud infrastructure:

API (Tweede Kamer OData) 
  → download_past_year.py 
  → motions table (28,172 rows)
  
motions
  → extract_mp_votes.py      → mp_votes table (506,336 rows)
  → sync_motion_content.py   → body_text enrichment (26,447 motions, ~94%)
  → text_pipeline.py         → embeddings table (28,172 rows, qwen3-embedding-4b via OpenRouter)
  → svd_pipeline.py          → svd_vectors table (54,150 rows, 38 windows)
  
svd_vectors + embeddings
  → fusion.py                → fused_embeddings table (40,522 rows)
  
fused_embeddings
  → similarity/compute.py    → similarity_cache table (405,216 rows, top-10 per window)

The similarity computation is pure NumPy: load all fused vectors for a window, pad to uniform length, L2-normalize, compute the full N×N cosine similarity matrix via a single matrix multiply (normalized @ normalized.T), then extract top-k neighbors per row with np.argpartition. For a 4,000-motion quarter, that's a 4000×4000 matrix operation — fast enough that it's not worth batching.

The database sits at 15 GB on disk — up from ~3 GB before body text enrichment. The full parliamentary text for 26,000+ motions accounts for most of that growth.


What's Next

Motion explorer: Given a motion, retrieve the 10 most politically and semantically similar ones from across the decade. Trace how a policy debate evolved — who championed it, how the coalitions shifted.

Party trajectory animation: Procrustes-aligned positions, animated year by year. Watch D66 drift post-2021, watch PVV consolidate its flank, watch new parties arrive and find their geometric home.

Cross-party coalition patterns: The fused embeddings let us ask which topics produce unusual coalition configurations — motions where the normal left-right split breaks down and unexpected alliances form.

The controversy index: 1 - winning_margin gives a controversy score per motion. The most contested votes — close margins, high-salience topics — tell a different story than the headline political narratives.


Reproducibility

# Download historical data
python scripts/download_past_year.py --start-date 2016-01-01 --end-date 2026-01-01

# Run full pipeline (SVD, text embeddings, fusion, similarity cache)
python -m pipeline.run_pipeline --db-path data/motions.db \
    --start-date 2016-01-01 --end-date 2026-01-01 \
    --window-size quarterly --text-batch-size 200

# Enrich with full motion body text
python scripts/sync_motion_content.py --db-path data/motions.db

The DB grows to ~15 GB for the full dataset including body text. All computation — SVD, fusion, similarity — runs locally on a single machine.

Democracy is more legible than it looks.