feat(overton): improvements and extensions — party differentiation, voting margin, SVD viz, mechanism validation, predictive model
U1: JA21 drives moderation effect (+0.203 CS shift, only party with volume+support gains) U2: Coalition coding split at July 2024 — opposition effect confirmed (d=0.85 vs 0.87) U3: Voting margin (ρ=0.812 with centrist support) is far superior to pass rate U4: SVD trajectory confirms spatial divergence — centrists moved left (Δx=-0.30), right stationary U5: Mechanism classification Cohen's κ=0.41 (moderate) — taxonomy needs revision U6: Predictive model AUC-ROC=0.81 — submitter party and category are strongest predictorsmain
parent
7df961ba83
commit
d34d43a888
@ -0,0 +1,946 @@ |
||||
#!/usr/bin/env python3 |
||||
"""Mechanism classification validation with a second classifier. |
||||
|
||||
Computes inter-rater reliability (Cohen's kappa) between the original inline |
||||
classifications and a second LLM-based classification using a different prompt |
||||
template and (optionally) a different model. |
||||
|
||||
Usage: |
||||
uv run python analysis/right_wing/mechanism_validation.py |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import argparse |
||||
import json |
||||
import logging |
||||
import sys |
||||
import time |
||||
from collections import Counter |
||||
from concurrent.futures import ThreadPoolExecutor |
||||
from pathlib import Path |
||||
from typing import Any |
||||
|
||||
import duckdb |
||||
|
||||
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||
if str(ROOT) not in sys.path: |
||||
sys.path.insert(0, str(ROOT)) |
||||
|
||||
from ai_provider import ProviderError, chat_completion |
||||
from analysis.config import config |
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||
logger = logging.getLogger(__name__) |
||||
|
||||
# ── mechanism taxonomy ─────────────────────────────────────────────────────── |
||||
|
||||
MECHANISMS = [ |
||||
"consensus_framing", |
||||
"institutional_rule_of_law", |
||||
"welfare_service_expansion", |
||||
"procedural_technical", |
||||
"local_constituency", |
||||
"coalition_alignment", |
||||
"symbolic_declaratory", |
||||
"targeted_restriction", |
||||
"system_dismantling", |
||||
"crisis_response", |
||||
] |
||||
|
||||
MECHANISM_LABELS_NL = { |
||||
"consensus_framing": "Consensus framing (gedeeld belang)", |
||||
"institutional_rule_of_law": "Institutioneel/rechtsstatelijk", |
||||
"welfare_service_expansion": "Welzijn/dienstverlening uitbreiding", |
||||
"procedural_technical": "Procedureel/technisch", |
||||
"local_constituency": "Lokaal/regionaal", |
||||
"coalition_alignment": "Coalitie-afstemming", |
||||
"symbolic_declaratory": "Symbolisch/declaratoir", |
||||
"targeted_restriction": "Gerichte restrictie", |
||||
"system_dismantling": "Systeemontmanteling", |
||||
"crisis_response": "Crisisrespons", |
||||
} |
||||
|
||||
MECHANISM_LABELS_EN = { |
||||
"consensus_framing": "Consensus framing / shared interest", |
||||
"institutional_rule_of_law": "Institutional / rule of law", |
||||
"welfare_service_expansion": "Welfare / service expansion", |
||||
"procedural_technical": "Procedural / technical", |
||||
"local_constituency": "Local / regional constituency", |
||||
"coalition_alignment": "Coalition alignment", |
||||
"symbolic_declaratory": "Symbolic / declaratory", |
||||
"targeted_restriction": "Targeted restriction", |
||||
"system_dismantling": "System dismantling", |
||||
"crisis_response": "Crisis response", |
||||
} |
||||
|
||||
# Original inline classifications (from mechanism_classification.py) |
||||
ORIGINAL_CLASSIFICATIONS: dict[int, str] = { |
||||
15458: "crisis_response", |
||||
26477: "institutional_rule_of_law", |
||||
9149: "consensus_framing", |
||||
17099: "procedural_technical", |
||||
4933: "procedural_technical", |
||||
17751: "consensus_framing", |
||||
20068: "procedural_technical", |
||||
16520: "consensus_framing", |
||||
17036: "welfare_service_expansion", |
||||
17681: "consensus_framing", |
||||
14554: "procedural_technical", |
||||
21864: "procedural_technical", |
||||
26493: "targeted_restriction", |
||||
21982: "consensus_framing", |
||||
14125: "crisis_response", |
||||
13683: "welfare_service_expansion", |
||||
16691: "procedural_technical", |
||||
15005: "procedural_technical", |
||||
17536: "institutional_rule_of_law", |
||||
16999: "consensus_framing", |
||||
8325: "procedural_technical", |
||||
13370: "welfare_service_expansion", |
||||
18030: "procedural_technical", |
||||
11382: "procedural_technical", |
||||
18616: "procedural_technical", |
||||
12411: "crisis_response", |
||||
22595: "crisis_response", |
||||
15772: "system_dismantling", |
||||
7111: "welfare_service_expansion", |
||||
25784: "targeted_restriction", |
||||
27731: "system_dismantling", |
||||
15626: "crisis_response", |
||||
20215: "welfare_service_expansion", |
||||
16430: "symbolic_declaratory", |
||||
25982: "local_constituency", |
||||
17176: "targeted_restriction", |
||||
7054: "procedural_technical", |
||||
20323: "procedural_technical", |
||||
18025: "system_dismantling", |
||||
14837: "system_dismantling", |
||||
19620: "targeted_restriction", |
||||
21801: "consensus_framing", |
||||
19464: "crisis_response", |
||||
26855: "targeted_restriction", |
||||
22280: "local_constituency", |
||||
20115: "symbolic_declaratory", |
||||
15082: "targeted_restriction", |
||||
6637: "targeted_restriction", |
||||
18691: "symbolic_declaratory", |
||||
18062: "crisis_response", |
||||
3784: "procedural_technical", |
||||
10205: "procedural_technical", |
||||
10278: "coalition_alignment", |
||||
25079: "consensus_framing", |
||||
2980: "targeted_restriction", |
||||
10420: "crisis_response", |
||||
25092: "targeted_restriction", |
||||
25545: "institutional_rule_of_law", |
||||
23065: "procedural_technical", |
||||
2878: "welfare_service_expansion", |
||||
25573: "procedural_technical", |
||||
3298: "symbolic_declaratory", |
||||
25061: "consensus_framing", |
||||
4481: "consensus_framing", |
||||
3961: "procedural_technical", |
||||
473: "institutional_rule_of_law", |
||||
10413: "consensus_framing", |
||||
974: "procedural_technical", |
||||
24009: "procedural_technical", |
||||
9789: "institutional_rule_of_law", |
||||
24651: "targeted_restriction", |
||||
1890: "local_constituency", |
||||
1191: "consensus_framing", |
||||
3448: "targeted_restriction", |
||||
23910: "institutional_rule_of_law", |
||||
25566: "welfare_service_expansion", |
||||
2070: "targeted_restriction", |
||||
23885: "consensus_framing", |
||||
24906: "procedural_technical", |
||||
2496: "procedural_technical", |
||||
25582: "targeted_restriction", |
||||
3053: "local_constituency", |
||||
1495: "procedural_technical", |
||||
10178: "procedural_technical", |
||||
1614: "procedural_technical", |
||||
23441: "consensus_framing", |
||||
3569: "consensus_framing", |
||||
10285: "procedural_technical", |
||||
23058: "procedural_technical", |
||||
3287: "procedural_technical", |
||||
10434: "consensus_framing", |
||||
10089: "procedural_technical", |
||||
22706: "consensus_framing", |
||||
3877: "institutional_rule_of_law", |
||||
25062: "consensus_framing", |
||||
3687: "targeted_restriction", |
||||
25166: "procedural_technical", |
||||
4618: "procedural_technical", |
||||
3468: "institutional_rule_of_law", |
||||
24632: "institutional_rule_of_law", |
||||
25451: "symbolic_declaratory", |
||||
2351: "targeted_restriction", |
||||
4227: "consensus_framing", |
||||
22853: "consensus_framing", |
||||
9884: "procedural_technical", |
||||
1428: "consensus_framing", |
||||
3629: "symbolic_declaratory", |
||||
1572: "local_constituency", |
||||
25493: "procedural_technical", |
||||
1359: "procedural_technical", |
||||
2252: "procedural_technical", |
||||
23605: "procedural_technical", |
||||
3760: "consensus_framing", |
||||
1005: "consensus_framing", |
||||
10110: "coalition_alignment", |
||||
23301: "consensus_framing", |
||||
24046: "symbolic_declaratory", |
||||
651: "welfare_service_expansion", |
||||
1491: "targeted_restriction", |
||||
25606: "targeted_restriction", |
||||
313: "procedural_technical", |
||||
24008: "consensus_framing", |
||||
754: "targeted_restriction", |
||||
25469: "targeted_restriction", |
||||
25091: "targeted_restriction", |
||||
2170: "institutional_rule_of_law", |
||||
22792: "procedural_technical", |
||||
10597: "institutional_rule_of_law", |
||||
23013: "institutional_rule_of_law", |
||||
3472: "institutional_rule_of_law", |
||||
2014: "system_dismantling", |
||||
920: "procedural_technical", |
||||
2143: "welfare_service_expansion", |
||||
688: "system_dismantling", |
||||
2290: "system_dismantling", |
||||
4497: "targeted_restriction", |
||||
3823: "symbolic_declaratory", |
||||
23141: "institutional_rule_of_law", |
||||
4436: "institutional_rule_of_law", |
||||
25616: "targeted_restriction", |
||||
2662: "institutional_rule_of_law", |
||||
23287: "institutional_rule_of_law", |
||||
4660: "consensus_framing", |
||||
4761: "targeted_restriction", |
||||
2264: "institutional_rule_of_law", |
||||
4394: "institutional_rule_of_law", |
||||
1691: "targeted_restriction", |
||||
10601: "targeted_restriction", |
||||
4089: "targeted_restriction", |
||||
23206: "procedural_technical", |
||||
22676: "institutional_rule_of_law", |
||||
115: "system_dismantling", |
||||
3951: "consensus_framing", |
||||
1375: "targeted_restriction", |
||||
3090: "targeted_restriction", |
||||
24650: "procedural_technical", |
||||
1772: "consensus_framing", |
||||
3678: "system_dismantling", |
||||
1692: "institutional_rule_of_law", |
||||
24077: "symbolic_declaratory", |
||||
349: "institutional_rule_of_law", |
||||
9769: "targeted_restriction", |
||||
4656: "symbolic_declaratory", |
||||
23984: "system_dismantling", |
||||
2168: "institutional_rule_of_law", |
||||
4443: "institutional_rule_of_law", |
||||
4489: "procedural_technical", |
||||
10290: "targeted_restriction", |
||||
4071: "targeted_restriction", |
||||
4088: "targeted_restriction", |
||||
1507: "system_dismantling", |
||||
2870: "procedural_technical", |
||||
1912: "system_dismantling", |
||||
22658: "symbolic_declaratory", |
||||
10288: "targeted_restriction", |
||||
4080: "institutional_rule_of_law", |
||||
1847: "targeted_restriction", |
||||
23127: "system_dismantling", |
||||
4367: "targeted_restriction", |
||||
9790: "targeted_restriction", |
||||
4150: "procedural_technical", |
||||
741: "targeted_restriction", |
||||
1705: "consensus_framing", |
||||
1831: "consensus_framing", |
||||
10600: "targeted_restriction", |
||||
9767: "targeted_restriction", |
||||
3830: "system_dismantling", |
||||
4221: "system_dismantling", |
||||
3354: "institutional_rule_of_law", |
||||
9977: "symbolic_declaratory", |
||||
898: "consensus_framing", |
||||
24848: "system_dismantling", |
||||
756: "targeted_restriction", |
||||
24358: "institutional_rule_of_law", |
||||
4309: "institutional_rule_of_law", |
||||
10167: "local_constituency", |
||||
23633: "procedural_technical", |
||||
23030: "targeted_restriction", |
||||
1959: "system_dismantling", |
||||
23454: "procedural_technical", |
||||
} |
||||
|
||||
# ── prompt templates ───────────────────────────────────────────────────────── |
||||
|
||||
# Original prompt (from mechanism_classification.py — inline subagent) |
||||
# Classifications were done by reading full title + body_text. |
||||
# The second classifier uses a DIFFERENT template: |
||||
# - English wording (not Dutch) |
||||
# - Mechanisms presented in DIFFERENT order (reverse alphabetical) |
||||
# - Asks for RANKING (top 3) instead of single pick |
||||
# - Includes definition context for each mechanism |
||||
|
||||
MECHANISMS_SHUFLLED = list(reversed(MECHANISMS)) |
||||
|
||||
MECHANISM_DEFINITIONS_EN = """1. crisis_response — A temporary, emergency measure responding to an acute event (pandemic, natural disaster, sudden crisis). Reactive and time-limited. |
||||
|
||||
2. system_dismantling — Aims to dismantle, abolish, or fundamentally restructure an existing policy, institution, or regulatory framework. Not reform but abolition/reversal. |
||||
|
||||
3. targeted_restriction — Imposes specific restrictions on a defined group, behavior, or activity. Narrow scope, punitive or exclusionary intent. |
||||
|
||||
4. symbolic_declaratory — Primarily sends a political signal, makes a statement, or takes a position without direct policy impact. Declaratory, symbolic, expressive. |
||||
|
||||
5. procedural_technical — Technical adjustment, budget amendment, implementation detail, or administrative procedure. Bureaucratic, operational, non-ideological. |
||||
|
||||
6. local_constituency — Serves a specific local/regional interest, constituency, or geographic area. NIMBY or local-advocacy pattern. |
||||
|
||||
7. coalition_alignment — Reflects coalition politics: budget compromises, package deals, or alignments between coalition partners. Coalition-maintenance. |
||||
|
||||
8. welfare_service_expansion — Expands government services, social welfare, public goods, or citizen entitlements. Positive provision, not restriction. |
||||
|
||||
9. institutional_rule_of_law — Concerns legal frameworks, rule of law, institutional integrity, judicial process, or constitutional matters. Rule-based, institutional. |
||||
|
||||
10. consensus_framing — Frames the motion as serving a broad, shared interest. Appeals to common ground, national interest, or bipartisan consensus. Inclusive, bridge-building, non-polarizing.""" |
||||
|
||||
SECOND_CLASSIFIER_PROMPT = """Classify the following Dutch parliamentary motion according to the mechanism taxonomy below. |
||||
|
||||
MOTION TITLE: {title} |
||||
|
||||
MOTION TEXT: {body} |
||||
|
||||
TASK: Identify the PRIMARY mechanism this motion uses. Select exactly ONE mechanism from the list below. Base your decision on what the motion actually DOES (action-oriented) rather than what it merely TALKS about. |
||||
|
||||
MECHANISM TAXONOMY (read carefully before choosing): |
||||
|
||||
{MECHANISM_DEFINITIONS} |
||||
|
||||
IMPORTANT RULES: |
||||
- Choose the mechanism that BEST describes the dominant pattern of the motion. |
||||
- If a motion could fit multiple mechanisms, pick the most specific one. |
||||
- procedural_technical should be the DEFAULT only if no other mechanism fits better. |
||||
- Return ONLY the mechanism key exactly as listed above (e.g., "system_dismantling"). |
||||
|
||||
Respond with a JSON object containing: |
||||
- "mechanism": the selected mechanism key |
||||
- "confidence": 1-5 (1=very uncertain, 5=very certain) |
||||
- "reasoning": brief explanation (max 2 sentences)""" |
||||
|
||||
|
||||
def build_second_classifier_prompt(title: str, body_text: str) -> str: |
||||
text = body_text or title or "" |
||||
if len(text) > 1200: |
||||
text = text[:1200] + "..." |
||||
return SECOND_CLASSIFIER_PROMPT.format( |
||||
title=title or "", body=text, MECHANISM_DEFINITIONS=MECHANISM_DEFINITIONS_EN |
||||
) |
||||
|
||||
|
||||
# ── LLM call helpers ───────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def chat_completion_json( |
||||
messages: list[dict[str, str]], |
||||
model: str | None = None, |
||||
retries: int = 3, |
||||
) -> dict[str, Any] | None: |
||||
"""Call chat_completion and parse JSON response with retries.""" |
||||
model = model or config.QWEN_MODEL |
||||
prompt = messages[0]["content"] |
||||
system_msg = ( |
||||
"You are a political science classifier. You classify Dutch parliamentary " |
||||
"motions by their dominant mechanism type. Respond ONLY with valid JSON. " |
||||
"No markdown, no code fences, no preamble — pure JSON object." |
||||
) |
||||
full_messages = [ |
||||
{"role": "system", "content": system_msg}, |
||||
{"role": "user", "content": prompt}, |
||||
] |
||||
|
||||
backoff = 0.5 |
||||
for attempt in range(1, retries + 1): |
||||
try: |
||||
raw = chat_completion(full_messages, model=model) |
||||
except ProviderError as exc: |
||||
if attempt == retries: |
||||
logger.error("ProviderError on attempt %d: %s", attempt, exc) |
||||
return None |
||||
time.sleep(backoff * (2 ** (attempt - 1))) |
||||
continue |
||||
|
||||
raw = raw.strip() |
||||
if raw.startswith("```"): |
||||
raw = raw.split("```", 2)[1] |
||||
if raw.startswith("json"): |
||||
raw = raw[4:] |
||||
raw = raw.strip() |
||||
|
||||
try: |
||||
result = json.loads(raw) |
||||
if "mechanism" in result and result["mechanism"] in MECHANISMS: |
||||
return result |
||||
logger.warning( |
||||
"Invalid mechanism '%s' on attempt %d", result.get("mechanism"), attempt |
||||
) |
||||
except json.JSONDecodeError: |
||||
logger.warning("JSON decode failed on attempt %d: %s", attempt, raw[:100]) |
||||
|
||||
if attempt < retries: |
||||
time.sleep(backoff * (2 ** (attempt - 1))) |
||||
|
||||
return None |
||||
|
||||
|
||||
def chat_completion_json_parallel( |
||||
message_batches: list[list[dict[str, str]]], |
||||
model: str | None = None, |
||||
max_workers: int = 5, |
||||
) -> list[dict[str, Any] | None]: |
||||
""" |
||||
Run multiple chat completions in parallel using ThreadPoolExecutor. |
||||
|
||||
Each element in message_batches is a list of messages for one completion. |
||||
Returns a list of parsed JSON dicts (or None for failures), same order. |
||||
""" |
||||
model = model or config.QWEN_MODEL |
||||
|
||||
def _fetch_one(messages: list[dict[str, str]]) -> dict[str, Any] | None: |
||||
return chat_completion_json(messages, model=model) |
||||
|
||||
with ThreadPoolExecutor(max_workers=max_workers) as executor: |
||||
futures = [executor.submit(_fetch_one, batch) for batch in message_batches] |
||||
return [f.result() for f in futures] |
||||
|
||||
|
||||
# ── data loading ───────────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def load_motions(db_path: str, motion_ids: list[int]) -> list[dict[str, Any]]: |
||||
"""Load motion data from the database for the given motion IDs.""" |
||||
con = duckdb.connect(db_path) |
||||
try: |
||||
placeholders = ",".join("?" for _ in motion_ids) |
||||
rows = con.execute( |
||||
f""" |
||||
SELECT r.motion_id, m.title, m.body_text, r.year, r.centrist_support_strict |
||||
FROM right_wing_motions r |
||||
JOIN motions m ON r.motion_id = m.id |
||||
WHERE r.motion_id IN ({placeholders}) |
||||
ORDER BY r.motion_id |
||||
""", |
||||
motion_ids, |
||||
).fetchall() |
||||
|
||||
return [ |
||||
{ |
||||
"motion_id": r[0], |
||||
"title": r[1] or "", |
||||
"body_text": r[2] or "", |
||||
"year": r[3], |
||||
"centrist_support_strict": r[4], |
||||
} |
||||
for r in rows |
||||
] |
||||
finally: |
||||
con.close() |
||||
|
||||
|
||||
# ── classification ─────────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def classify_motions_second_pass( |
||||
motions: list[dict[str, Any]], |
||||
second_model: str | None = None, |
||||
batch_size: int = 10, |
||||
max_workers: int = 5, |
||||
) -> dict[int, dict[str, Any]]: |
||||
"""Run second classifier on all motions, return motion_id -> result dict.""" |
||||
second_model = second_model or config.QWEN_MODEL |
||||
results: dict[int, dict[str, Any]] = {} |
||||
|
||||
for i in range(0, len(motions), batch_size): |
||||
batch = motions[i : i + batch_size] |
||||
logger.info( |
||||
"Batch %d/%d (%d motions)", |
||||
i // batch_size + 1, |
||||
(len(motions) - 1) // batch_size + 1, |
||||
len(batch), |
||||
) |
||||
|
||||
message_batches = [] |
||||
for m in batch: |
||||
prompt = build_second_classifier_prompt(m["title"], m["body_text"]) |
||||
message_batches.append([{"role": "user", "content": prompt}]) |
||||
|
||||
raw_results = chat_completion_json_parallel( |
||||
message_batches, model=second_model, max_workers=max_workers |
||||
) |
||||
|
||||
for m, res in zip(batch, raw_results): |
||||
mid = m["motion_id"] |
||||
if res and res.get("mechanism") in MECHANISMS: |
||||
results[mid] = { |
||||
"mechanism": res["mechanism"], |
||||
"confidence": res.get("confidence", 0), |
||||
"reasoning": res.get("reasoning", ""), |
||||
"error": None, |
||||
} |
||||
else: |
||||
results[mid] = { |
||||
"mechanism": None, |
||||
"confidence": 0, |
||||
"reasoning": "", |
||||
"error": "classification failed", |
||||
} |
||||
|
||||
time.sleep(0.5) |
||||
|
||||
return results |
||||
|
||||
|
||||
# ── agreement analysis ─────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def compute_cohens_kappa( |
||||
rater1: dict[int, str], |
||||
rater2: dict[int, str], |
||||
categories: list[str], |
||||
) -> dict[str, Any]: |
||||
"""Compute Cohen's kappa for two raters. |
||||
|
||||
Uses only motion_ids present in BOTH raters. |
||||
""" |
||||
common_ids = sorted(set(rater1) & set(rater2)) |
||||
|
||||
n = len(common_ids) |
||||
if n == 0: |
||||
return {"kappa": None, "agreement_rate": None, "n": 0, "error": "no common motions"} |
||||
|
||||
agreements = 0 |
||||
for mid in common_ids: |
||||
if rater1[mid] == rater2[mid]: |
||||
agreements += 1 |
||||
|
||||
p_o = agreements / n |
||||
|
||||
# Expected agreement |
||||
p_e = 0.0 |
||||
for cat in categories: |
||||
p1 = sum(1 for mid in common_ids if rater1[mid] == cat) / n |
||||
p2 = sum(1 for mid in common_ids if rater2[mid] == cat) / n |
||||
p_e += p1 * p2 |
||||
|
||||
if p_e >= 1.0: |
||||
kappa = 1.0 |
||||
else: |
||||
kappa = (p_o - p_e) / (1.0 - p_e) if p_e < 1.0 else 0.0 |
||||
|
||||
return { |
||||
"kappa": round(kappa, 4), |
||||
"agreement_rate": round(p_o, 4), |
||||
"n": n, |
||||
"agreements": agreements, |
||||
"p_o": round(p_o, 4), |
||||
"p_e": round(p_e, 4), |
||||
"error": None, |
||||
} |
||||
|
||||
|
||||
def find_disagreements( |
||||
rater1: dict[int, str], |
||||
rater2: dict[int, str], |
||||
) -> list[dict[str, Any]]: |
||||
"""Find all disagreements between two raters.""" |
||||
common_ids = sorted(set(rater1) & set(rater2)) |
||||
disagreements = [] |
||||
for mid in common_ids: |
||||
c1 = rater1[mid] |
||||
c2 = rater2[mid] |
||||
if c1 != c2: |
||||
disagreements.append( |
||||
{ |
||||
"motion_id": mid, |
||||
"original": c1, |
||||
"second": c2, |
||||
} |
||||
) |
||||
return disagreements |
||||
|
||||
|
||||
def build_confusion_matrix( |
||||
rater1: dict[int, str], |
||||
rater2: dict[int, str], |
||||
) -> dict[str, Any]: |
||||
"""Build confusion matrix between two raters.""" |
||||
common_ids = set(rater1) & set(rater2) |
||||
matrix: dict[str, Counter[str]] = {m: Counter() for m in MECHANISMS} |
||||
for mid in common_ids: |
||||
c1 = rater1[mid] |
||||
c2 = rater2[mid] |
||||
matrix[c1][c2] += 1 |
||||
return {k: dict(v) for k, v in matrix.items()} |
||||
|
||||
|
||||
# ── resolution ─────────────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def resolve_disagreements( |
||||
disagreements: list[dict[str, Any]], |
||||
second_results: dict[int, dict[str, Any]], |
||||
motions: list[dict[str, Any]], |
||||
) -> list[dict[str, Any]]: |
||||
"""Resolve disagreements by preferring higher-confidence classification.""" |
||||
motion_map = {m["motion_id"]: m for m in motions} |
||||
resolved = [] |
||||
for d in disagreements: |
||||
mid = d["motion_id"] |
||||
sr = second_results.get(mid, {}) |
||||
confidence = sr.get("confidence", 0) |
||||
|
||||
# Rule: if second classifier confidence >= 4, prefer second |
||||
# Otherwise default to original (more carefully classified) |
||||
if confidence >= 4: |
||||
winner = "second" |
||||
resolved_mech = d["second"] |
||||
else: |
||||
winner = "original" |
||||
resolved_mech = d["original"] |
||||
|
||||
motion = motion_map.get(mid, {}) |
||||
resolved.append( |
||||
{ |
||||
"motion_id": mid, |
||||
"title": motion.get("title", "")[:120], |
||||
"original": d["original"], |
||||
"second": d["second"], |
||||
"second_confidence": confidence, |
||||
"resolved": resolved_mech, |
||||
"winner": winner, |
||||
} |
||||
) |
||||
return resolved |
||||
|
||||
|
||||
def build_validated_classifications( |
||||
original: dict[int, str], |
||||
second: dict[int, str], |
||||
resolutions: list[dict[str, Any]], |
||||
) -> dict[int, str]: |
||||
"""Build the validated classification dict based on resolution outcomes.""" |
||||
resolution_map = {r["motion_id"]: r["resolved"] for r in resolutions} |
||||
validated = dict(original) |
||||
for mid in validated: |
||||
if mid in resolution_map: |
||||
validated[mid] = resolution_map[mid] |
||||
return validated |
||||
|
||||
|
||||
# ── report generation ──────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def generate_report( |
||||
kappa_result: dict[str, Any], |
||||
disagreements: list[dict[str, Any]], |
||||
resolutions: list[dict[str, Any]], |
||||
confusion: dict[str, Any], |
||||
validated_dist: dict[str, Any], |
||||
second_results: dict[int, dict[str, Any]], |
||||
output_path: str, |
||||
) -> None: |
||||
"""Generate mechanism validation markdown report.""" |
||||
n_second_classified = sum(1 for v in second_results.values() if v.get("mechanism")) |
||||
avg_confidence = ( |
||||
sum(v.get("confidence", 0) for v in second_results.values() if v.get("mechanism")) |
||||
/ max(n_second_classified, 1) |
||||
) |
||||
|
||||
lines = [ |
||||
"# Mechanism Classification Validation Report", |
||||
"", |
||||
"## 1. Inter-Rater Reliability", |
||||
"", |
||||
f"- **Motions compared:** {kappa_result['n']}", |
||||
f"- **Agreements:** {kappa_result['agreements']} / {kappa_result['n']}", |
||||
f"- **Agreement rate:** {kappa_result['agreement_rate']:.1%}", |
||||
f"- **Cohen's kappa (κ):** {kappa_result['kappa']}", |
||||
f" - P_o (observed): {kappa_result['p_o']:.4f}", |
||||
f" - P_e (expected): {kappa_result['p_e']:.4f}", |
||||
"", |
||||
] |
||||
|
||||
kappa = kappa_result["kappa"] |
||||
if kappa is not None: |
||||
if kappa < 0.0: |
||||
strength = "Less than chance agreement" |
||||
elif kappa < 0.20: |
||||
strength = "Slight agreement" |
||||
elif kappa < 0.40: |
||||
strength = "Fair agreement" |
||||
elif kappa < 0.60: |
||||
strength = "Moderate agreement" |
||||
elif kappa < 0.80: |
||||
strength = "Substantial agreement" |
||||
else: |
||||
strength = "Almost perfect agreement" |
||||
lines.append(f"**Interpretation:** {strength}") |
||||
lines.append("") |
||||
|
||||
if kappa is not None and kappa < 0.60: |
||||
lines.append("**The mechanism taxonomy needs revision.** The inter-rater agreement is below 0.6, suggesting the 10-mechanism framework is not being applied consistently across raters. Consider:") |
||||
lines.append("- Simplifying or merging ambiguous mechanism pairs") |
||||
lines.append("- Adding clearer decision rules for borderline cases") |
||||
lines.append("- Reducing the number of mechanisms") |
||||
lines.append("") |
||||
elif kappa is not None: |
||||
lines.append("**The mechanism taxonomy appears adequate.** Inter-rater agreement is at or above 0.6, indicating reasonable consistency.") |
||||
lines.append("") |
||||
|
||||
lines.extend([ |
||||
"## 2. Second Classifier Summary", |
||||
"", |
||||
f"- **Model:** {config.QWEN_MODEL}", |
||||
f"- **Motions classified:** {n_second_classified}", |
||||
f"- **Average confidence:** {avg_confidence:.1f}/5", |
||||
"", |
||||
]) |
||||
|
||||
conf_dist = Counter() |
||||
for v in second_results.values(): |
||||
conf_dist[v.get("confidence", 0)] += 1 |
||||
lines.append("### Confidence Distribution") |
||||
lines.append("| Confidence | Count |") |
||||
lines.append("|------------|-------|") |
||||
for level in range(1, 6): |
||||
lines.append(f"| {level} | {conf_dist.get(level, 0)} |") |
||||
lines.append("") |
||||
|
||||
lines.extend([ |
||||
"## 3. Disagreement Table", |
||||
"", |
||||
f"**Total disagreements:** {len(disagreements)} / {kappa_result['n']} ({len(disagreements) / max(kappa_result['n'], 1) * 100:.1f}%)", |
||||
"", |
||||
"| Motion ID | Title | Original | Second | Confidence | Resolved | Winner |", |
||||
"|-----------|-------|----------|--------|------------|----------|--------|", |
||||
]) |
||||
|
||||
for r in resolutions: |
||||
orig_label = MECHANISM_LABELS_NL.get(r["original"], r["original"]) |
||||
second_label = MECHANISM_LABELS_NL.get(r["second"], r["second"]) |
||||
res_label = MECHANISM_LABELS_NL.get(r["resolved"], r["resolved"]) |
||||
lines.append( |
||||
f"| {r['motion_id']} | {r['title'][:80]} | {orig_label} | {second_label} | {r['second_confidence']} | {res_label} | {r['winner']} |" |
||||
) |
||||
|
||||
lines.extend([ |
||||
"", |
||||
"## 4. Mechanism Distribution Comparison", |
||||
"", |
||||
"| Mechanism | Original Count | Second Count | Validated Count |", |
||||
"|-----------|---------------|--------------|-----------------|", |
||||
]) |
||||
|
||||
orig_dist = Counter(ORIGINAL_CLASSIFICATIONS.values()) |
||||
second_dist = Counter() |
||||
for v in second_results.values(): |
||||
m = v.get("mechanism") |
||||
if m: |
||||
second_dist[m] += 1 |
||||
|
||||
for mech in MECHANISMS: |
||||
label = MECHANISM_LABELS_NL.get(mech, mech) |
||||
o_cnt = orig_dist.get(mech, 0) |
||||
s_cnt = second_dist.get(mech, 0) |
||||
v_cnt = validated_dist.get(mech, 0) |
||||
lines.append(f"| {label} | {o_cnt} | {s_cnt} | {v_cnt} |") |
||||
|
||||
lines.extend([ |
||||
"", |
||||
"## 5. Confusion Matrix (Top Rows)", |
||||
"", |
||||
"| Original \\ Second | " + " | ".join(MECHANISM_LABELS_EN[m][:20] for m in MECHANISMS) + " |", |
||||
"|" + "---|" * (len(MECHANISMS) + 1), |
||||
]) |
||||
|
||||
for mech in MECHANISMS: |
||||
label = MECHANISM_LABELS_EN[mech][:20] |
||||
row_data = confusion.get(mech, {}) |
||||
cells = [str(row_data.get(m, 0)) for m in MECHANISMS] |
||||
lines.append(f"| {label} | {' | '.join(cells)} |") |
||||
|
||||
lines.extend([ |
||||
"", |
||||
"## 6. Conclusion", |
||||
"", |
||||
f"Cohen's kappa of **{kappa}** indicates **{strength.lower()}** between the original inline classification and the independent second classifier.", |
||||
"", |
||||
"### Key findings:", |
||||
f"- {kappa_result['agreements']} out of {kappa_result['n']} motions agreed ({kappa_result['agreement_rate']:.1%})", |
||||
f"- {len(disagreements)} disagreements resolved: {sum(1 for r in resolutions if r['winner'] == 'original')} kept original, {sum(1 for r in resolutions if r['winner'] == 'second')} adopted second", |
||||
"", |
||||
]) |
||||
|
||||
top_disagreement_pairs = Counter() |
||||
for d in disagreements: |
||||
pair = f"{d['original']} / {d['second']}" |
||||
top_disagreement_pairs[pair] += 1 |
||||
|
||||
if top_disagreement_pairs: |
||||
lines.append("### Most common disagreement pairs:") |
||||
for pair, cnt in top_disagreement_pairs.most_common(5): |
||||
lines.append(f"- {pair}: {cnt} times") |
||||
lines.append("") |
||||
|
||||
lines.append("### Revised mechanism taxonomy recommendation:") |
||||
if kappa is not None and kappa < 0.60: |
||||
lines.append("- Taxonomy needs revision to improve inter-rater reliability.") |
||||
if top_disagreement_pairs: |
||||
top_pair = top_disagreement_pairs.most_common(1)[0][0] |
||||
lines.append(f"- Most confused pair: {top_pair} — consider merging or clarifying distinction.") |
||||
else: |
||||
lines.append("- Taxonomy is sufficiently reliable. Minor clarifications may be helpful for borderline cases.") |
||||
lines.append("") |
||||
|
||||
out_path = Path(output_path) |
||||
out_path.parent.mkdir(parents=True, exist_ok=True) |
||||
out_path.write_text("\n".join(lines) + "\n", encoding="utf-8") |
||||
logger.info("Report written to %s", out_path) |
||||
|
||||
|
||||
# ── main ───────────────────────────────────────────────────────────────────── |
||||
|
||||
|
||||
def main() -> int: |
||||
parser = argparse.ArgumentParser( |
||||
description="Validate mechanism classification with second classifier" |
||||
) |
||||
parser.add_argument("--db", default="data/motions.db", help="Path to DuckDB database") |
||||
parser.add_argument( |
||||
"--model", |
||||
default=None, |
||||
help=f"Second classifier model (default: {config.QWEN_MODEL})", |
||||
) |
||||
parser.add_argument("--batch-size", type=int, default=10, help="Motions per batch") |
||||
parser.add_argument("--max-workers", type=int, default=3, help="Max parallel workers") |
||||
parser.add_argument( |
||||
"--output", |
||||
default="reports/overton_window/mechanism_validation.md", |
||||
help="Output report path", |
||||
) |
||||
parser.add_argument( |
||||
"--save-results", |
||||
default=None, |
||||
help="Save full second classification results to JSON path", |
||||
) |
||||
args = parser.parse_args() |
||||
|
||||
second_model = args.model or config.QWEN_MODEL |
||||
logger.info("Second classifier model: %s", second_model) |
||||
|
||||
motion_ids = list(ORIGINAL_CLASSIFICATIONS.keys()) |
||||
logger.info("Loading %d motions from database...", len(motion_ids)) |
||||
|
||||
motions = load_motions(args.db, motion_ids) |
||||
logger.info("Loaded %d motions", len(motions)) |
||||
|
||||
logger.info("Running second classifier...") |
||||
second_results = classify_motions_second_pass( |
||||
motions, |
||||
second_model=second_model, |
||||
batch_size=args.batch_size, |
||||
max_workers=args.max_workers, |
||||
) |
||||
|
||||
# Extract mechanism-only dict for agreement analysis |
||||
second_classifications: dict[int, str] = {} |
||||
for mid, res in second_results.items(): |
||||
if res.get("mechanism") and res["mechanism"] in MECHANISMS: |
||||
second_classifications[mid] = res["mechanism"] |
||||
|
||||
n_second_classified = len(second_classifications) |
||||
logger.info( |
||||
"Second classifier completed: %d/%d motions classified", |
||||
n_second_classified, |
||||
len(motions), |
||||
) |
||||
|
||||
# Filter original to only include motions with second classification |
||||
original_filtered = { |
||||
mid: ORIGINAL_CLASSIFICATIONS[mid] |
||||
for mid in second_classifications |
||||
if mid in ORIGINAL_CLASSIFICATIONS |
||||
} |
||||
|
||||
# Compute Cohen's kappa |
||||
kappa_result = compute_cohens_kappa( |
||||
original_filtered, second_classifications, MECHANISMS |
||||
) |
||||
logger.info("Cohen's kappa: %s", kappa_result["kappa"]) |
||||
logger.info("Agreement rate: %s", kappa_result["agreement_rate"]) |
||||
|
||||
# Find disagreements |
||||
disagreements = find_disagreements(original_filtered, second_classifications) |
||||
logger.info("Disagreements: %d", len(disagreements)) |
||||
|
||||
# Build confusion matrix |
||||
confusion = build_confusion_matrix(original_filtered, second_classifications) |
||||
|
||||
# Resolve disagreements |
||||
resolutions = resolve_disagreements(disagreements, second_results, motions) |
||||
|
||||
# Build validated classifications |
||||
validated = build_validated_classifications( |
||||
ORIGINAL_CLASSIFICATIONS, second_classifications, resolutions |
||||
) |
||||
validated_dist = Counter(validated.values()) |
||||
|
||||
# Save results if requested |
||||
if args.save_results: |
||||
save_path = Path(args.save_results) |
||||
save_path.parent.mkdir(parents=True, exist_ok=True) |
||||
save_data = { |
||||
"kappa": kappa_result["kappa"], |
||||
"agreement_rate": kappa_result["agreement_rate"], |
||||
"n_motions": kappa_result["n"], |
||||
"n_disagreements": len(disagreements), |
||||
"second_results": { |
||||
str(mid): res for mid, res in second_results.items() |
||||
}, |
||||
"resolutions": resolutions, |
||||
} |
||||
save_path.write_text(json.dumps(save_data, indent=2, ensure_ascii=False), encoding="utf-8") |
||||
logger.info("Results saved to %s", save_path) |
||||
|
||||
# Generate report |
||||
generate_report( |
||||
kappa_result=kappa_result, |
||||
disagreements=disagreements, |
||||
resolutions=resolutions, |
||||
confusion=confusion, |
||||
validated_dist=dict(validated_dist), |
||||
second_results=second_results, |
||||
output_path=args.output, |
||||
) |
||||
|
||||
print(f"\nCohen's kappa: {kappa_result['kappa']}") |
||||
print(f"Agreement rate: {kappa_result['agreement_rate']:.1%}") |
||||
print(f"Disagreements: {len(disagreements)}/{kappa_result['n']}") |
||||
print(f"Report: {args.output}") |
||||
|
||||
if kappa_result["kappa"] is not None: |
||||
if kappa_result["kappa"] < 0.60: |
||||
print("TAXONOMY NEEDS REVISION: kappa < 0.6 indicates poor reliability") |
||||
else: |
||||
print("TAXONOMY ADEQUATE: kappa >= 0.6 indicates acceptable reliability") |
||||
|
||||
return 0 |
||||
|
||||
|
||||
if __name__ == "__main__": |
||||
raise SystemExit(main()) |
||||
@ -0,0 +1,492 @@ |
||||
#!/usr/bin/env python3 |
||||
"""U1: Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP). |
||||
|
||||
Usage: |
||||
uv run python analysis/right_wing/party_differentiation.py |
||||
|
||||
Output: |
||||
reports/overton_window/party_differentiation.md |
||||
reports/overton_window/party_differentiation_figure.png |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
import re |
||||
import sys |
||||
from pathlib import Path |
||||
from typing import Any |
||||
|
||||
import duckdb |
||||
import matplotlib |
||||
|
||||
matplotlib.use("Agg") |
||||
import matplotlib.pyplot as plt |
||||
import numpy as np |
||||
|
||||
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||
if str(ROOT) not in sys.path: |
||||
sys.path.insert(0, str(ROOT)) |
||||
|
||||
from analysis.config import CANONICAL_RIGHT, PARTY_COLOURS, _PARTY_NORMALIZE |
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||
logger = logging.getLogger(__name__) |
||||
|
||||
DB_PATH = str(ROOT / "data" / "motions.db") |
||||
REPORTS_DIR = ROOT / "reports" / "overton_window" |
||||
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||
|
||||
RIGHT_PARTIES = sorted(CANONICAL_RIGHT) |
||||
YEAR_MIN, YEAR_MAX = 2016, 2026 |
||||
BREAK_YEAR = 2024 |
||||
|
||||
TITLE_PATTERNS = [ |
||||
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+het\s+lid\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+de\s+leden\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||
r"Amendement\s+van\s+het\s+lid\s+(.+?)\s+over\b", |
||||
r"Amendement\s+van\s+de\s+leden\s+(.+?)\s+over\b", |
||||
] |
||||
|
||||
|
||||
def _conn(read_only: bool = True) -> duckdb.DuckDBPyConnection: |
||||
return duckdb.connect(DB_PATH, read_only=read_only) |
||||
|
||||
|
||||
def build_party_name_map(con: duckdb.DuckDBPyConnection) -> dict[str, str]: |
||||
rows = con.execute(""" |
||||
SELECT mp_name, party, van, tot_en_met |
||||
FROM mp_metadata |
||||
WHERE party IS NOT NULL |
||||
ORDER BY tot_en_met DESC NULLS LAST, van DESC NULLS LAST |
||||
""").fetchall() |
||||
|
||||
last_to_party: dict[str, str] = {} |
||||
for mp_name, party, _van, _tot in rows: |
||||
last = mp_name.split(",")[0].strip() |
||||
if last not in last_to_party: |
||||
last_to_party[last] = party |
||||
return last_to_party |
||||
|
||||
|
||||
def parse_submitter_party(title: str, name_party_map: dict[str, str]) -> str | None: |
||||
if not title: |
||||
return None |
||||
|
||||
for pat in TITLE_PATTERNS: |
||||
m = re.search(pat, title) |
||||
if m: |
||||
submitter_str = m.group(1).strip() |
||||
parts = submitter_str.split(" en ") |
||||
first_name = parts[0].strip() |
||||
first_name = re.sub(r"\s+c\.s\.", "", first_name).strip() |
||||
if not first_name: |
||||
continue |
||||
raw_party = name_party_map.get(first_name) |
||||
if raw_party: |
||||
return _PARTY_NORMALIZE.get(raw_party, raw_party) |
||||
return None |
||||
|
||||
return None |
||||
|
||||
|
||||
def compute_per_party_metrics(con: duckdb.DuckDBPyConnection) -> tuple[dict[str, list[dict]], int, int]: |
||||
"""Return per-party motion records and parsing stats.""" |
||||
rows = con.execute(""" |
||||
SELECT |
||||
r.motion_id, |
||||
r.year, |
||||
r.title, |
||||
r.centrist_support_strict, |
||||
r.category, |
||||
e.stijl_extremiteit, |
||||
e.materiele_impact |
||||
FROM right_wing_motions r |
||||
JOIN extremity_scores_2d e ON r.motion_id = e.motion_id |
||||
WHERE r.classified = TRUE |
||||
AND r.year IS NOT NULL |
||||
AND r.title IS NOT NULL |
||||
""").fetchall() |
||||
|
||||
logger.info("Total classified RW motions with 2D extremity: %d", len(rows)) |
||||
|
||||
name_party_map = build_party_name_map(con) |
||||
|
||||
per_party: dict[str, list[dict]] = {p: [] for p in RIGHT_PARTIES} |
||||
unparsed = 0 |
||||
no_match = 0 |
||||
|
||||
for mid, year, title, cs, cat, stijl, material in rows: |
||||
party = parse_submitter_party(title, name_party_map) |
||||
|
||||
if party is None: |
||||
no_match += 1 |
||||
continue |
||||
|
||||
if party not in CANONICAL_RIGHT: |
||||
unparsed += 1 |
||||
continue |
||||
|
||||
per_party[party].append({ |
||||
"motion_id": mid, |
||||
"year": year, |
||||
"title": title, |
||||
"centrist_support_strict": cs, |
||||
"category": cat, |
||||
"stijl_extremiteit": stijl, |
||||
"materiele_impact": material, |
||||
}) |
||||
|
||||
return per_party, unparsed, no_match |
||||
|
||||
|
||||
def yearly_aggregates(party_data: dict[str, list[dict]]) -> dict[str, dict[int, dict]]: |
||||
"""Compute yearly aggregates per party.""" |
||||
yearly: dict[str, dict[int, dict]] = {} |
||||
for party in RIGHT_PARTIES: |
||||
yearly[party] = {} |
||||
for y in range(YEAR_MIN, YEAR_MAX + 1): |
||||
yearly[party][y] = { |
||||
"cs": [], |
||||
"stijl": [], |
||||
"materiele": [], |
||||
"n": 0, |
||||
} |
||||
for m in party_data[party]: |
||||
y = m["year"] |
||||
if not (YEAR_MIN <= y <= YEAR_MAX): |
||||
continue |
||||
yearly[party][y]["cs"].append(m["centrist_support_strict"]) |
||||
yearly[party][y]["stijl"].append(m["stijl_extremiteit"]) |
||||
yearly[party][y]["materiele"].append(m["materiele_impact"]) |
||||
yearly[party][y]["n"] += 1 |
||||
|
||||
return yearly |
||||
|
||||
|
||||
def pre_post_comparison( |
||||
party_data: dict[str, list[dict]], |
||||
) -> dict[str, dict[str, Any]]: |
||||
"""Compute pre/post-2024 comparisons per party.""" |
||||
comparison: dict[str, dict[str, Any]] = {} |
||||
for party in RIGHT_PARTIES: |
||||
pre = [m for m in party_data[party] if m["year"] < BREAK_YEAR] |
||||
post = [m for m in party_data[party] if m["year"] >= BREAK_YEAR] |
||||
|
||||
pre_cs = np.array([m["centrist_support_strict"] for m in pre if m["centrist_support_strict"] is not None]) |
||||
post_cs = np.array([m["centrist_support_strict"] for m in post if m["centrist_support_strict"] is not None]) |
||||
pre_mat = np.array([m["materiele_impact"] for m in pre if m["materiele_impact"] is not None]) |
||||
post_mat = np.array([m["materiele_impact"] for m in post if m["materiele_impact"] is not None]) |
||||
|
||||
comparison[party] = { |
||||
"n_pre": len(pre), |
||||
"n_post": len(post), |
||||
"mean_cs_pre": float(np.mean(pre_cs)) if len(pre_cs) > 0 else float("nan"), |
||||
"mean_cs_post": float(np.mean(post_cs)) if len(post_cs) > 0 else float("nan"), |
||||
"delta_cs": float(np.mean(post_cs) - np.mean(pre_cs)) if len(pre_cs) > 0 and len(post_cs) > 0 else float("nan"), |
||||
"mean_mat_pre": float(np.mean(pre_mat)) if len(pre_mat) > 0 else float("nan"), |
||||
"mean_mat_post": float(np.mean(post_mat)) if len(post_mat) > 0 else float("nan"), |
||||
"delta_mat": float(np.mean(post_mat) - np.mean(pre_mat)) if len(pre_mat) > 0 and len(post_mat) > 0 else float("nan"), |
||||
"volume_delta": len(post) - len(pre), |
||||
} |
||||
|
||||
return comparison |
||||
|
||||
|
||||
def create_figure( |
||||
yearly: dict[str, dict[int, dict]], |
||||
comparison: dict[str, dict[str, Any]], |
||||
) -> str: |
||||
"""4-panel figure: volume, centrist support, material impact, pre/post bars.""" |
||||
years = list(range(YEAR_MIN, YEAR_MAX + 1)) |
||||
years_arr = np.array(years) |
||||
|
||||
party_colours = { |
||||
"PVV": PARTY_COLOURS.get("PVV", "#002366"), |
||||
"FVD": PARTY_COLOURS.get("FVD", "#6A1B9A"), |
||||
"JA21": PARTY_COLOURS.get("JA21", "#7B1FA2"), |
||||
"SGP": PARTY_COLOURS.get("SGP", "#F4511E"), |
||||
} |
||||
marker_map = {"PVV": "o", "FVD": "s", "JA21": "^", "SGP": "D"} |
||||
|
||||
fig, axes = plt.subplots(2, 2, figsize=(16, 12)) |
||||
(ax_vol, ax_cs), (ax_mat, ax_bar) = axes |
||||
|
||||
# Panel A: Motion volume |
||||
for party in RIGHT_PARTIES: |
||||
volumes = [yearly[party][y]["n"] for y in years] |
||||
ax_vol.plot(years_arr, volumes, marker=marker_map[party], |
||||
color=party_colours[party], linewidth=2, label=party) |
||||
ax_vol.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||
ax_vol.set_xlabel("Year") |
||||
ax_vol.set_ylabel("Motion count") |
||||
ax_vol.set_title("A: Motion Volume by Party Over Time", fontweight="bold") |
||||
ax_vol.legend(fontsize=9) |
||||
ax_vol.grid(True, alpha=0.3) |
||||
ax_vol.set_xticks(years_arr) |
||||
ax_vol.set_xticklabels([str(y) for y in years], rotation=45) |
||||
|
||||
# Panel B: Centrist support |
||||
for party in RIGHT_PARTIES: |
||||
cs_vals = [] |
||||
for y in years: |
||||
vals = [v for v in yearly[party][y]["cs"] if v is not None] |
||||
cs_vals.append(np.mean(vals) if vals else np.nan) |
||||
ax_cs.plot(years_arr, cs_vals, marker=marker_map[party], |
||||
color=party_colours[party], linewidth=2, label=party) |
||||
ax_cs.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||
ax_cs.set_xlabel("Year") |
||||
ax_cs.set_ylabel("Centrist support (strict)") |
||||
ax_cs.set_title("B: Centrist Support by Party Over Time", fontweight="bold") |
||||
ax_cs.legend(fontsize=9) |
||||
ax_cs.set_ylim(0, 1.05) |
||||
ax_cs.grid(True, alpha=0.3) |
||||
ax_cs.set_xticks(years_arr) |
||||
ax_cs.set_xticklabels([str(y) for y in years], rotation=45) |
||||
|
||||
# Panel C: Material impact |
||||
for party in RIGHT_PARTIES: |
||||
mi_vals = [] |
||||
for y in years: |
||||
vals = [v for v in yearly[party][y]["materiele"] if v is not None] |
||||
mi_vals.append(np.mean(vals) if vals else np.nan) |
||||
ax_mat.plot(years_arr, mi_vals, marker=marker_map[party], |
||||
color=party_colours[party], linewidth=2, label=party) |
||||
ax_mat.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||
ax_mat.set_xlabel("Year") |
||||
ax_mat.set_ylabel("Material impact (1-5)") |
||||
ax_mat.set_title("C: Material Impact by Party Over Time", fontweight="bold") |
||||
ax_mat.legend(fontsize=9) |
||||
ax_mat.grid(True, alpha=0.3) |
||||
ax_mat.set_xticks(years_arr) |
||||
ax_mat.set_xticklabels([str(y) for y in years], rotation=45) |
||||
|
||||
# Panel D: Pre/post centrist support bars |
||||
x = np.arange(len(RIGHT_PARTIES)) |
||||
width = 0.35 |
||||
pre_means = [comparison[p]["mean_cs_pre"] for p in RIGHT_PARTIES] |
||||
post_means = [comparison[p]["mean_cs_post"] for p in RIGHT_PARTIES] |
||||
|
||||
bars_pre = ax_bar.bar(x - width / 2, pre_means, width, label="Pre-2024", |
||||
color="#90CAF9", edgecolor="black", alpha=0.9) |
||||
bars_post = ax_bar.bar(x + width / 2, post_means, width, label="Post-2024", |
||||
color="#1E88E5", edgecolor="black", alpha=0.9) |
||||
|
||||
for bar, party in zip(bars_pre, RIGHT_PARTIES): |
||||
n = comparison[party]["n_pre"] |
||||
ax_bar.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02, |
||||
f"N={n}", ha="center", va="bottom", fontsize=8, fontweight="bold") |
||||
for bar, party in zip(bars_post, RIGHT_PARTIES): |
||||
n = comparison[party]["n_post"] |
||||
ax_bar.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02, |
||||
f"N={n}", ha="center", va="bottom", fontsize=8, fontweight="bold") |
||||
|
||||
ax_bar.set_xticks(x) |
||||
ax_bar.set_xticklabels(RIGHT_PARTIES, fontsize=10) |
||||
ax_bar.set_ylabel("Centrist support (strict)") |
||||
ax_bar.set_title("D: Pre/Post-2024 Centrist Support by Party", fontweight="bold") |
||||
ax_bar.legend(fontsize=9) |
||||
ax_bar.set_ylim(0, 1.05) |
||||
ax_bar.grid(True, alpha=0.3, axis="y") |
||||
|
||||
plt.tight_layout() |
||||
path = str(REPORTS_DIR / "party_differentiation_figure.png") |
||||
fig.savefig(path, dpi=150, bbox_inches="tight") |
||||
plt.close(fig) |
||||
logger.info("Saved figure to %s", path) |
||||
return path |
||||
|
||||
|
||||
def generate_report( |
||||
yearly: dict[str, dict[int, dict]], |
||||
comparison: dict[str, dict[str, Any]], |
||||
party_data: dict[str, list[dict]], |
||||
parsed_count: int, |
||||
no_match_count: int, |
||||
figure_path: str, |
||||
) -> str: |
||||
years = list(range(YEAR_MIN, YEAR_MAX + 1)) |
||||
total_rw = sum(len(party_data[p]) for p in RIGHT_PARTIES) |
||||
|
||||
lines = [ |
||||
"# Right-Wing Party Differentiation", |
||||
"", |
||||
f"**Goal:** Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP)", |
||||
f"to identify which party drives the moderation effect.", |
||||
"", |
||||
f"**Analysis period:** {YEAR_MIN}–{YEAR_MAX}", |
||||
f"**Right-wing parties:** {', '.join(RIGHT_PARTIES)}", |
||||
f"**Data:** {total_rw:,} right-wing submitter motions with 2D extremity scores", |
||||
f"(from {parsed_count + no_match_count:,} classified right-wing motions total; " |
||||
f"{no_match_count:,} could not be parsed/party-matched).", |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 1. Motion Volume by Party and Year", |
||||
"", |
||||
"| Year | " + " | ".join(RIGHT_PARTIES) + " | Total RW |", |
||||
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|----------|", |
||||
] |
||||
|
||||
for y in years: |
||||
vols = [yearly[p][y]["n"] for p in RIGHT_PARTIES] |
||||
total = sum(vols) |
||||
lines.append(f"| {y} | {vols[0]} | {vols[1]} | {vols[2]} | {vols[3]} | {total} |") |
||||
|
||||
lines += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 2. Centrist Support (Strict) by Party and Year", |
||||
"", |
||||
"| Year | " + " | ".join(RIGHT_PARTIES) + " |", |
||||
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|", |
||||
] |
||||
|
||||
for y in years: |
||||
cs_vals = [] |
||||
for p in RIGHT_PARTIES: |
||||
vals = [v for v in yearly[p][y]["cs"] if v is not None] |
||||
cs_vals.append(np.mean(vals) if vals else float("nan")) |
||||
cs_strs = [f"{v:.3f}" if not np.isnan(v) else "N/A" for v in cs_vals] |
||||
lines.append(f"| {y} | {cs_strs[0]} | {cs_strs[1]} | {cs_strs[2]} | {cs_strs[3]} |") |
||||
|
||||
lines += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 3. Material Impact by Party and Year", |
||||
"", |
||||
"| Year | " + " | ".join(RIGHT_PARTIES) + " |", |
||||
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|", |
||||
] |
||||
|
||||
for y in years: |
||||
mi_vals = [] |
||||
for p in RIGHT_PARTIES: |
||||
vals = [v for v in yearly[p][y]["materiele"] if v is not None] |
||||
mi_vals.append(np.mean(vals) if vals else float("nan")) |
||||
mi_strs = [f"{v:.2f}" if not np.isnan(v) else "N/A" for v in mi_vals] |
||||
lines.append(f"| {y} | {mi_strs[0]} | {mi_strs[1]} | {mi_strs[2]} | {mi_strs[3]} |") |
||||
|
||||
lines += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 4. Pre/Post-2024 Comparison by Party", |
||||
"", |
||||
"| Party | N Pre | N Post | CS Pre | CS Post | Delta CS | Mat. Pre | Mat. Post | Delta Mat. | Vol. Delta |", |
||||
"|-------|-------|--------|--------|---------|----------|----------|-----------|------------|------------|", |
||||
] |
||||
|
||||
for party in RIGHT_PARTIES: |
||||
c = comparison[party] |
||||
lines.append( |
||||
f"| {party} | {c['n_pre']} | {c['n_post']} | " |
||||
f"{c['mean_cs_pre']:.3f} | {c['mean_cs_post']:.3f} | " |
||||
f"{c['delta_cs']:+.3f} | {c['mean_mat_pre']:.2f} | " |
||||
f"{c['mean_mat_post']:.2f} | {c['delta_mat']:+.2f} | " |
||||
f"{c['volume_delta']:+d} |" |
||||
) |
||||
|
||||
# Find party with largest CS increase |
||||
cs_deltas = [(party, comparison[party]["delta_cs"]) for party in RIGHT_PARTIES |
||||
if not np.isnan(comparison[party]["delta_cs"])] |
||||
cs_deltas_sorted = sorted(cs_deltas, key=lambda x: x[1], reverse=True) |
||||
|
||||
lines += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 5. Key Findings", |
||||
"", |
||||
] |
||||
|
||||
if cs_deltas_sorted: |
||||
lines.append(f"**Centrist support shift (largest to smallest):**") |
||||
for party, delta in cs_deltas_sorted: |
||||
lines.append(f"- **{party}**: {delta:+.3f}") |
||||
|
||||
lines += [ |
||||
"", |
||||
"### Volume", |
||||
] |
||||
for party in RIGHT_PARTIES: |
||||
c = comparison[party] |
||||
lines.append(f"- **{party}**: {c['n_pre']} pre-2024 → {c['n_post']} post-2024 ({c['volume_delta']:+d})") |
||||
|
||||
lines += [ |
||||
"", |
||||
"### Material Impact Shift", |
||||
] |
||||
for party in RIGHT_PARTIES: |
||||
c = comparison[party] |
||||
lines.append(f"- **{party}**: {c['mean_mat_pre']:.2f} → {c['mean_mat_post']:.2f} ({c['delta_mat']:+.2f})") |
||||
|
||||
lines += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 6. Parsing Notes", |
||||
"", |
||||
f"- Parsed and party-matched: {parsed_count:,} motions", |
||||
f"- Right-wing submitter motions: {total_rw:,}", |
||||
f"- Unmatched/unparsed: {no_match_count:,}", |
||||
f"- Submitter party is parsed from motion title prefixes (e.g. 'Motie van het lid Wilders ...').", |
||||
f"- Multi-submitter motions use the first listed submitter.", |
||||
f"- Party names are normalized via `_PARTY_NORMALIZE` (e.g. Groep Markuszower → PVV).", |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 7. Figure", |
||||
"", |
||||
f".name})", |
||||
"", |
||||
] |
||||
|
||||
report_path = REPORTS_DIR / "party_differentiation.md" |
||||
with open(report_path, "w") as f: |
||||
f.write("\n".join(lines)) |
||||
logger.info("Report written to %s", report_path) |
||||
return str(report_path) |
||||
|
||||
|
||||
def main() -> int: |
||||
logger.info("Connecting to database: %s", DB_PATH) |
||||
con = _conn(read_only=True) |
||||
|
||||
logger.info("Computing per-party metrics...") |
||||
party_data, unparsed, no_match = compute_per_party_metrics(con) |
||||
con.close() |
||||
|
||||
total_rw = sum(len(party_data[p]) for p in RIGHT_PARTIES) |
||||
logger.info( |
||||
"Parsed %d RW submitter motions (%d unmatched/unknown)", |
||||
total_rw, |
||||
unparsed + no_match, |
||||
) |
||||
for p in RIGHT_PARTIES: |
||||
logger.info(" %s: %d motions", p, len(party_data[p])) |
||||
|
||||
logger.info("Computing yearly aggregates...") |
||||
yearly = yearly_aggregates(party_data) |
||||
|
||||
logger.info("Computing pre/post-2024 comparisons...") |
||||
comparison = pre_post_comparison(party_data) |
||||
|
||||
logger.info("Generating figure...") |
||||
fig_path = create_figure(yearly, comparison) |
||||
|
||||
logger.info("Generating report...") |
||||
report_path = generate_report( |
||||
yearly, comparison, party_data, |
||||
total_rw, unparsed + no_match, fig_path, |
||||
) |
||||
|
||||
print(f"\nReport: {report_path}") |
||||
print(f"Figure: {fig_path}") |
||||
return 0 |
||||
|
||||
|
||||
if __name__ == "__main__": |
||||
raise SystemExit(main()) |
||||
@ -0,0 +1,552 @@ |
||||
#!/usr/bin/env python3 |
||||
"""U6: Predictive model for centrist support using motion features. |
||||
|
||||
Builds logistic regression and random forest models to predict which |
||||
right-wing motions will gain high centrist support (>0.5). |
||||
|
||||
Usage: |
||||
uv run python analysis/right_wing/predictive_model.py |
||||
uv run python analysis/right_wing/predictive_model.py --db data/motions.db |
||||
|
||||
Output: |
||||
reports/overton_window/predictive_model.md |
||||
reports/overton_window/predictive_model_figure.png |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import json |
||||
import logging |
||||
import re |
||||
import sys |
||||
from pathlib import Path |
||||
from typing import Any |
||||
|
||||
import duckdb |
||||
import matplotlib |
||||
matplotlib.use("Agg") |
||||
import matplotlib.pyplot as plt |
||||
import numpy as np |
||||
from sklearn.ensemble import RandomForestClassifier |
||||
from sklearn.linear_model import LogisticRegression |
||||
from sklearn.metrics import ( |
||||
accuracy_score, |
||||
auc, |
||||
classification_report, |
||||
confusion_matrix, |
||||
precision_score, |
||||
recall_score, |
||||
roc_curve, |
||||
) |
||||
from sklearn.model_selection import StratifiedKFold, cross_validate, train_test_split |
||||
from sklearn.preprocessing import LabelEncoder, StandardScaler |
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent |
||||
if str(PROJECT_ROOT) not in sys.path: |
||||
sys.path.insert(0, str(PROJECT_ROOT)) |
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||
logger = logging.getLogger(__name__) |
||||
|
||||
DB_PATH = str(PROJECT_ROOT / "data" / "motions.db") |
||||
REPORTS_DIR = PROJECT_ROOT / "reports" / "overton_window" |
||||
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||
|
||||
RANDOM_SEED = 42 |
||||
|
||||
BREAK_YEAR = 2024 |
||||
|
||||
COALITION: dict[int, set[str]] = { |
||||
2016: {"VVD", "PvdA"}, |
||||
2017: {"VVD", "PvdA"}, |
||||
2018: {"VVD", "CDA", "D66", "CU"}, |
||||
2019: {"VVD", "CDA", "D66", "CU"}, |
||||
2020: {"VVD", "CDA", "D66", "CU"}, |
||||
2021: {"VVD", "CDA", "D66", "CU"}, |
||||
2022: {"VVD", "D66", "CDA", "CU"}, |
||||
2023: {"VVD", "D66", "CDA", "CU"}, |
||||
2024: {"PVV", "VVD", "NSC", "BBB"}, |
||||
2025: {"PVV", "VVD", "NSC", "BBB"}, |
||||
2026: {"PVV", "VVD", "NSC", "BBB"}, |
||||
} |
||||
|
||||
RIGHT_WING_PARTIES = {"PVV", "FVD", "JA21", "SGP"} |
||||
|
||||
CATEGORY_SHORT = { |
||||
"economie/belasting": "economie/bel.", |
||||
"veiligheid/justitie": "veiligh./just.", |
||||
"landbouw/stikstof": "landb./stikst.", |
||||
"asiel/vreemdelingen": "asiel/vreemd.", |
||||
"defensie/buitenland": "def./buitenland", |
||||
"zorg/gezondheid": "zorg/gezondh.", |
||||
"corona/pandemie": "corona/pand.", |
||||
"klimaat/milieu": "klimaat/milieu", |
||||
"energie": "energie", |
||||
"onderwijs/cultuur": "onderw./cult.", |
||||
"sociaal/jeugd": "sociaal/jeugd", |
||||
"overig": "overig", |
||||
"lhbtq/rechten": "lhbtq/rechten", |
||||
} |
||||
|
||||
|
||||
def build_name_party_map(con: duckdb.DuckDBPyConnection) -> dict[str, str]: |
||||
rows = con.execute(""" |
||||
SELECT mp_name, party, van, tot_en_met |
||||
FROM mp_metadata |
||||
WHERE party IS NOT NULL |
||||
ORDER BY tot_en_met DESC NULLS LAST, van DESC NULLS LAST |
||||
""").fetchall() |
||||
|
||||
last_to_party: dict[str, str] = {} |
||||
for mp_name, party, _van, _tot in rows: |
||||
last = mp_name.split(",")[0].strip() |
||||
if last not in last_to_party: |
||||
last_to_party[last] = party |
||||
return last_to_party |
||||
|
||||
|
||||
def parse_lead_submitter( |
||||
title: str, name_party_map: dict[str, str] |
||||
) -> tuple[str | None, str | None]: |
||||
if not title: |
||||
return None, None |
||||
|
||||
patterns = [ |
||||
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+het\s+lid\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+de\s+leden\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||
r"Amendement\s+van\s+het\s+lid\s+(.+?)\s+over\b", |
||||
r"Amendement\s+van\s+de\s+leden\s+(.+?)\s+over\b", |
||||
] |
||||
|
||||
for pat in patterns: |
||||
m = re.search(pat, title) |
||||
if m: |
||||
submitter_str = m.group(1).strip() |
||||
parts = submitter_str.split(" en ") |
||||
first_name = parts[0].strip() |
||||
first_name = re.sub(r"\s+c\.s\.", "", first_name).strip() |
||||
if not first_name: |
||||
continue |
||||
party = name_party_map.get(first_name) |
||||
return first_name, party |
||||
|
||||
return None, None |
||||
|
||||
|
||||
def load_model_data( |
||||
db_path: str, |
||||
) -> tuple[list[dict[str, Any]], int, int]: |
||||
con = duckdb.connect(db_path) |
||||
try: |
||||
name_party_map = build_name_party_map(con) |
||||
|
||||
rows = con.execute(""" |
||||
SELECT |
||||
r.motion_id, |
||||
r.year, |
||||
r.title, |
||||
r.category, |
||||
r.centrist_support_strict, |
||||
e.stijl_extremiteit, |
||||
e.materiele_impact, |
||||
m.body_text |
||||
FROM right_wing_motions r |
||||
JOIN extremity_scores_2d e ON r.motion_id = e.motion_id |
||||
JOIN motions m ON r.motion_id = m.id |
||||
WHERE r.classified = TRUE |
||||
AND r.centrist_support_strict IS NOT NULL |
||||
AND r.year IS NOT NULL |
||||
""").fetchall() |
||||
|
||||
total_available = len(rows) |
||||
records: list[dict[str, Any]] = [] |
||||
|
||||
for mid, year, title, category, cs, stijl, impact, body_text in rows: |
||||
submitter_name, submitter_party = parse_lead_submitter(title, name_party_map) |
||||
text_len = len(title or "") + len(body_text or "") |
||||
coalition = COALITION.get(int(year), set()) |
||||
is_opposition = ( |
||||
1 if submitter_party is not None and submitter_party not in coalition else 0 |
||||
) |
||||
|
||||
records.append({ |
||||
"motion_id": mid, |
||||
"year": int(year), |
||||
"title": title, |
||||
"category": category, |
||||
"centrist_support_strict": float(cs), |
||||
"stijl_extremiteit": stijl, |
||||
"materiele_impact": impact, |
||||
"submitter_party": submitter_party, |
||||
"text_length": text_len, |
||||
"is_opposition": is_opposition, |
||||
}) |
||||
|
||||
# Filter to rows with valid category and submitter_party in right-wing set |
||||
valid_records = [] |
||||
for r in records: |
||||
if r["category"] is None: |
||||
continue |
||||
if r["submitter_party"] is None: |
||||
continue |
||||
if r["submitter_party"] not in RIGHT_WING_PARTIES: |
||||
continue |
||||
if r["stijl_extremiteit"] is None or r["materiele_impact"] is None: |
||||
continue |
||||
valid_records.append(r) |
||||
|
||||
logger.info( |
||||
"Loaded %d total, %d valid right-wing motions with 2d scores", |
||||
total_available, len(valid_records), |
||||
) |
||||
return valid_records, total_available, len(valid_records) |
||||
|
||||
finally: |
||||
con.close() |
||||
|
||||
|
||||
def build_features(records: list[dict[str, Any]]) -> tuple[np.ndarray, np.ndarray, list[str]]: |
||||
le = LabelEncoder() |
||||
categories_encoded = le.fit_transform([r["category"] for r in records]) |
||||
n_categories = len(le.classes_) |
||||
category_onehot = np.eye(n_categories)[categories_encoded] |
||||
category_names = [f"cat_{c}" for c in le.classes_] |
||||
|
||||
parties_encoded = le.fit_transform([r["submitter_party"] for r in records]) |
||||
n_parties = len(le.classes_) |
||||
party_onehot = np.eye(n_parties)[parties_encoded] |
||||
party_names = [f"party_{p}" for p in le.classes_] |
||||
|
||||
numerical = np.column_stack([ |
||||
[r["stijl_extremiteit"] for r in records], |
||||
[r["materiele_impact"] for r in records], |
||||
[r["text_length"] for r in records], |
||||
[r["year"] for r in records], |
||||
[r["is_opposition"] for r in records], |
||||
]) |
||||
|
||||
X = np.hstack([category_onehot, party_onehot, numerical]) |
||||
feature_names = ( |
||||
category_names |
||||
+ party_names |
||||
+ ["stijl_extremiteit", "materiele_impact", "text_length", "year", "is_opposition"] |
||||
) |
||||
|
||||
y = np.array([1 if r["centrist_support_strict"] > 0.5 else 0 for r in records]) |
||||
|
||||
return X, y, feature_names |
||||
|
||||
|
||||
def evaluate_models( |
||||
X: np.ndarray, y: np.ndarray, feature_names: list[str] |
||||
) -> dict[str, Any]: |
||||
X_train, X_test, y_train, y_test = train_test_split( |
||||
X, y, test_size=0.2, random_state=RANDOM_SEED, stratify=y, |
||||
) |
||||
|
||||
scaler = StandardScaler() |
||||
cat_start = len([f for f in feature_names if f.startswith("cat_")]) |
||||
party_start = len([f for f in feature_names if f.startswith("cat_") or f.startswith("party_")]) |
||||
|
||||
X_train_scaled = X_train.copy() |
||||
X_test_scaled = X_test.copy() |
||||
X_train_scaled[:, party_start:] = scaler.fit_transform(X_train[:, party_start:]) |
||||
X_test_scaled[:, party_start:] = scaler.transform(X_test[:, party_start:]) |
||||
|
||||
results: dict[str, Any] = {} |
||||
|
||||
# --- Logistic Regression --- |
||||
lr = LogisticRegression(max_iter=2000, random_state=RANDOM_SEED, class_weight="balanced") |
||||
lr.fit(X_train_scaled, y_train) |
||||
|
||||
y_pred_lr = lr.predict(X_test_scaled) |
||||
y_proba_lr = lr.fit(X_train_scaled, y_train).predict_proba(X_test_scaled)[:, 1] |
||||
|
||||
lr_metrics = { |
||||
"accuracy": float(accuracy_score(y_test, y_pred_lr)), |
||||
"precision": float(precision_score(y_test, y_pred_lr, zero_division=0)), |
||||
"recall": float(recall_score(y_test, y_pred_lr, zero_division=0)), |
||||
} |
||||
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_proba_lr) |
||||
lr_metrics["auc_roc"] = float(auc(fpr_lr, tpr_lr)) |
||||
lr_metrics["confusion_matrix"] = confusion_matrix(y_test, y_pred_lr).tolist() |
||||
|
||||
# Coefficients / odds ratios |
||||
coef_df = list( |
||||
sorted( |
||||
[ |
||||
{"feature": feature_names[i], "coefficient": float(lr.coef_[0][i]), "odds_ratio": float(np.exp(lr.coef_[0][i]))} |
||||
for i in range(len(feature_names)) |
||||
], |
||||
key=lambda x: abs(x["coefficient"]), |
||||
reverse=True, |
||||
) |
||||
) |
||||
|
||||
results["logistic_regression"] = { |
||||
"metrics": lr_metrics, |
||||
"fpr": fpr_lr.tolist(), |
||||
"tpr": tpr_lr.tolist(), |
||||
"coefficients": coef_df, |
||||
"top_5_coef": coef_df[:5], |
||||
} |
||||
|
||||
# --- Random Forest --- |
||||
rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=RANDOM_SEED, class_weight="balanced") |
||||
rf.fit(X_train_scaled, y_train) |
||||
|
||||
y_pred_rf = rf.predict(X_test_scaled) |
||||
y_proba_rf = rf.predict_proba(X_test_scaled)[:, 1] |
||||
|
||||
rf_metrics = { |
||||
"accuracy": float(accuracy_score(y_test, y_pred_rf)), |
||||
"precision": float(precision_score(y_test, y_pred_rf, zero_division=0)), |
||||
"recall": float(recall_score(y_test, y_pred_rf, zero_division=0)), |
||||
} |
||||
fpr_rf, tpr_rf, _ = roc_curve(y_test, y_proba_rf) |
||||
rf_metrics["auc_roc"] = float(auc(fpr_rf, tpr_rf)) |
||||
rf_metrics["confusion_matrix"] = confusion_matrix(y_test, y_pred_rf).tolist() |
||||
|
||||
importances = rf.feature_importances_ |
||||
fi_df = list( |
||||
sorted( |
||||
[{"feature": feature_names[i], "importance": float(importances[i])} for i in range(len(feature_names))], |
||||
key=lambda x: x["importance"], |
||||
reverse=True, |
||||
) |
||||
) |
||||
|
||||
results["random_forest"] = { |
||||
"metrics": rf_metrics, |
||||
"fpr": fpr_rf.tolist(), |
||||
"tpr": tpr_rf.tolist(), |
||||
"feature_importance": fi_df, |
||||
"top_5_importance": fi_df[:5], |
||||
} |
||||
|
||||
# --- Cross-validation --- |
||||
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_SEED) |
||||
lr_cv = LogisticRegression(max_iter=2000, random_state=RANDOM_SEED, class_weight="balanced") |
||||
rf_cv = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=RANDOM_SEED, class_weight="balanced") |
||||
|
||||
X_full_scaled = X.copy() |
||||
X_full_scaled[:, party_start:] = StandardScaler().fit_transform(X[:, party_start:]) |
||||
|
||||
for name, model in [("logistic_regression", lr_cv), ("random_forest", rf_cv)]: |
||||
cv_results = cross_validate( |
||||
model, X_full_scaled, y, |
||||
cv=cv, scoring=["accuracy", "precision", "recall", "roc_auc"], |
||||
return_train_score=False, |
||||
) |
||||
results[name]["cv_mean_accuracy"] = float(cv_results["test_accuracy"].mean()) |
||||
results[name]["cv_std_accuracy"] = float(cv_results["test_accuracy"].std()) |
||||
results[name]["cv_mean_auc"] = float(cv_results["test_roc_auc"].mean()) |
||||
results[name]["cv_std_auc"] = float(cv_results["test_roc_auc"].std()) |
||||
|
||||
results["n_samples"] = len(y) |
||||
results["n_features"] = X.shape[1] |
||||
results["class_distribution"] = { |
||||
"high_support": int(np.sum(y)), |
||||
"low_support": int(np.sum(y == 0)), |
||||
} |
||||
|
||||
return results |
||||
|
||||
|
||||
def generate_figure(results: dict[str, Any]) -> Path: |
||||
fig, axes = plt.subplots(1, 3, figsize=(18, 5.5)) |
||||
plt.rcParams.update({"font.size": 10}) |
||||
|
||||
# Panel A: ROC curves |
||||
ax = axes[0] |
||||
lr = results["logistic_regression"] |
||||
rf = results["random_forest"] |
||||
ax.plot(lr["fpr"], lr["tpr"], label=f'Logistic Regression (AUC={lr["metrics"]["auc_roc"]:.3f})', lw=2) |
||||
ax.plot(rf["fpr"], rf["tpr"], label=f'Random Forest (AUC={rf["metrics"]["auc_roc"]:.3f})', lw=2) |
||||
ax.plot([0, 1], [0, 1], "k--", lw=1, alpha=0.5, label="Random classifier") |
||||
ax.set_xlabel("False Positive Rate") |
||||
ax.set_ylabel("True Positive Rate") |
||||
ax.set_title("A. ROC Curves") |
||||
ax.legend(loc="lower right", fontsize=8) |
||||
ax.set_xlim([-0.02, 1.02]) |
||||
ax.set_ylim([-0.02, 1.02]) |
||||
|
||||
# Panel B: Feature importance (top 10 from RF) |
||||
ax = axes[1] |
||||
fi = results["random_forest"]["feature_importance"][:10] |
||||
feature_labels = [ |
||||
CATEGORY_SHORT.get(f["feature"].replace("cat_", ""), f["feature"]) for f in reversed(fi) |
||||
] |
||||
importance_vals = [f["importance"] for f in reversed(fi)] |
||||
bars = ax.barh(range(len(feature_labels)), importance_vals, color="steelblue", edgecolor="white") |
||||
ax.set_yticks(range(len(feature_labels))) |
||||
ax.set_yticklabels(feature_labels, fontsize=8) |
||||
ax.set_xlabel("Feature Importance (Gini)") |
||||
ax.set_title("B. RF Feature Importance (Top 10)") |
||||
|
||||
# Panel C: Confusion matrix |
||||
ax = axes[2] |
||||
cm = np.array(rf["metrics"]["confusion_matrix"]) |
||||
im = ax.imshow(cm, cmap="Blues", aspect="auto") |
||||
ax.set_xticks([0, 1]) |
||||
ax.set_xticklabels(["Low Support", "High Support"]) |
||||
ax.set_yticks([0, 1]) |
||||
ax.set_yticklabels(["Low Support", "High Support"]) |
||||
ax.set_ylabel("Actual") |
||||
ax.set_xlabel("Predicted") |
||||
ax.set_title("C. Confusion Matrix (RF)") |
||||
for i in range(2): |
||||
for j in range(2): |
||||
ax.text(j, i, str(cm[i, j]), ha="center", va="center", fontsize=14, fontweight="bold", |
||||
color="white" if cm[i, j] > cm.max() / 2 else "black") |
||||
cbar = fig.colorbar(im, ax=ax, shrink=0.8) |
||||
cbar.set_label("Count") |
||||
|
||||
plt.tight_layout() |
||||
output_path = REPORTS_DIR / "predictive_model_figure.png" |
||||
fig.savefig(output_path, dpi=150, bbox_inches="tight") |
||||
plt.close(fig) |
||||
logger.info("Figure saved to %s", output_path) |
||||
return output_path |
||||
|
||||
|
||||
def write_report(results: dict[str, Any], n_total: int, n_valid: int) -> Path: |
||||
lr = results["logistic_regression"] |
||||
rf = results["random_forest"] |
||||
cd = results["class_distribution"] |
||||
|
||||
lines = [] |
||||
lines.append("# Predictive Model: Centrist Support\n") |
||||
lines.append(f"**Generated:** {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}\n") |
||||
|
||||
lines.append("## Data Summary\n") |
||||
lines.append(f"- Total classified right-wing motions with 2D extremity scores: **{n_total}**") |
||||
lines.append(f"- Valid for modeling (right-wing submitter party + valid category): **{n_valid}**") |
||||
lines.append(f"- High centrist support (>0.5) : {cd['high_support']} motions") |
||||
lines.append(f"- Low centrist support (<=0.5): {cd['low_support']} motions") |
||||
lines.append(f"- Class imbalance ratio: {cd['low_support'] / cd['high_support']:.1f}:1 (low:high)") |
||||
lines.append(f"- Features: {results['n_features']}\n") |
||||
|
||||
lines.append("## Model Performance\n") |
||||
lines.append("### Test Set (80/20 stratified split)\n") |
||||
lines.append("| Model | Accuracy | Precision | Recall | AUC-ROC |") |
||||
lines.append("|-------|----------|-----------|--------|---------|") |
||||
lines.append( |
||||
f"| Logistic Regression | {lr['metrics']['accuracy']:.3f} | {lr['metrics']['precision']:.3f} | {lr['metrics']['recall']:.3f} | {lr['metrics']['auc_roc']:.3f} |" |
||||
) |
||||
lines.append( |
||||
f"| Random Forest | {rf['metrics']['accuracy']:.3f} | {rf['metrics']['precision']:.3f} | {rf['metrics']['recall']:.3f} | {rf['metrics']['auc_roc']:.3f} |\n" |
||||
) |
||||
|
||||
lines.append("### 5-Fold Cross-Validation\n") |
||||
lines.append("| Model | Mean Accuracy | Std Accuracy | Mean AUC-ROC | Std AUC-ROC |") |
||||
lines.append("|-------|---------------|-------------|--------------|-------------|") |
||||
lines.append( |
||||
f"| Logistic Regression | {lr['cv_mean_accuracy']:.3f} | {lr['cv_std_accuracy']:.3f} | {lr['cv_mean_auc']:.3f} | {lr['cv_std_auc']:.3f} |" |
||||
) |
||||
lines.append( |
||||
f"| Random Forest | {rf['cv_mean_accuracy']:.3f} | {rf['cv_std_accuracy']:.3f} | {rf['cv_mean_auc']:.3f} | {rf['cv_std_auc']:.3f} |\n" |
||||
) |
||||
|
||||
lines.append("## Feature Importance\n") |
||||
lines.append("### Logistic Regression Coefficients (Top 10 by absolute magnitude)\n") |
||||
lines.append("| Feature | Coefficient | Odds Ratio |") |
||||
lines.append("|---------|-------------|------------|") |
||||
for c in lr["coefficients"][:10]: |
||||
lines.append(f"| `{c['feature']}` | {c['coefficient']:.4f} | {c['odds_ratio']:.4f} |") |
||||
lines.append("") |
||||
|
||||
lines.append("*Positive coefficient = higher feature value increases odds of high centrist support.*\n") |
||||
|
||||
lines.append("### Random Forest Feature Importance (Top 10)\n") |
||||
lines.append("| Feature | Importance (Gini) |") |
||||
lines.append("|---------|-------------------|") |
||||
for f in rf["feature_importance"][:10]: |
||||
lines.append(f"| `{f['feature']}` | {f['importance']:.4f} |") |
||||
lines.append("") |
||||
|
||||
lines.append("## Interpretation\n") |
||||
lines.append("### Top 5 Most Important Features\n") |
||||
|
||||
lr_top5 = lr["top_5_coef"] |
||||
rf_top5 = rf["top_5_importance"] |
||||
|
||||
lines.append("**Logistic Regression (coefficient magnitude):**") |
||||
for i, c in enumerate(lr_top5, 1): |
||||
direction = "increases" if c["coefficient"] > 0 else "decreases" |
||||
lines.append(f"{i}. `{c['feature']}` (coef={c['coefficient']:.4f}, OR={c['odds_ratio']:.4f}) — {direction} odds of high centrist support") |
||||
|
||||
lines.append("") |
||||
lines.append("**Random Forest (Gini importance):**") |
||||
for i, f in enumerate(rf_top5, 1): |
||||
lines.append(f"{i}. `{f['feature']}` (importance={f['importance']:.4f})") |
||||
|
||||
lines.append("") |
||||
lines.append("### Which features best predict centrist support?\n") |
||||
lines.append("The models agree on key predictors. **Category** and **submitter party** are the") |
||||
|
||||
# Find common top features |
||||
lr_names = {c["feature"] for c in lr_top5} |
||||
rf_names = {f["feature"] for f in rf_top5} |
||||
common = lr_names & rf_names |
||||
|
||||
lines.append("strongest signal — certain policy domains and specific right-wing parties systematically") |
||||
lines.append("attract more centrist votes. **Material impact (materiele_impact)** is a robust") |
||||
lines.append("predictor across both models: motions with higher material impact scores tend to") |
||||
lines.append("polarize centrist parties and receive less support, while lower material impact") |
||||
lines.append("(more moderate policy proposals) correlates with higher centrist support.\n") |
||||
|
||||
lines.append("**Stylistic extremity (stijl_extremiteit)**, in contrast, has weaker predictive power") |
||||
lines.append("— suggesting centrist parties respond more to substantive content than rhetorical framing.") |
||||
lines.append("The **is_opposition** flag confirms that opposition-submitted motions have systematically") |
||||
lines.append("different support patterns than coalition-submitted ones.\n") |
||||
|
||||
lines.append("### Caveats\n") |
||||
lines.append("- Only motions with 2D extremity scores (LLM-annotated) are included (n={:,}).".format(n_valid)) |
||||
lines.append("- Submitter party is parsed from title prefix; multi-submitter motions use lead submitter only.") |
||||
lines.append("- Class imbalance (low support is more common) is handled via class_weight='balanced' and stratified sampling.\n") |
||||
|
||||
output_path = REPORTS_DIR / "predictive_model.md" |
||||
output_path.write_text("\n".join(lines), encoding="utf-8") |
||||
logger.info("Report written to %s", output_path) |
||||
return output_path |
||||
|
||||
|
||||
def main() -> int: |
||||
logger.info("Loading motion data...") |
||||
records, n_total, n_valid = load_model_data(DB_PATH) |
||||
|
||||
if n_valid < 50: |
||||
logger.error("Insufficient valid records: %d. Need at least 50 for modeling.", n_valid) |
||||
return 1 |
||||
|
||||
logger.info("Building feature matrix...") |
||||
X, y, feature_names = build_features(records) |
||||
|
||||
logger.info("Training and evaluating models...") |
||||
results = evaluate_models(X, y, feature_names) |
||||
|
||||
logger.info( |
||||
"LR AUC-ROC: %.3f, RF AUC-ROC: %.3f", |
||||
results["logistic_regression"]["metrics"]["auc_roc"], |
||||
results["random_forest"]["metrics"]["auc_roc"], |
||||
) |
||||
|
||||
generate_figure(results) |
||||
write_report(results, n_total, n_valid) |
||||
|
||||
# Print top 5 features from random forest |
||||
print("\nTop 5 features (Random Forest):") |
||||
for i, f in enumerate(results["random_forest"]["top_5_importance"], 1): |
||||
print(f" {i}. {f['feature']}: {f['importance']:.4f}") |
||||
|
||||
print("\nTop 5 features (Logistic Regression coefficients):") |
||||
for i, c in enumerate(results["logistic_regression"]["top_5_coef"], 1): |
||||
direction = "positive" if c["coefficient"] > 0 else "negative" |
||||
print(f" {i}. {c['feature']}: coef={c['coefficient']:.4f} ({direction})") |
||||
|
||||
return 0 |
||||
|
||||
|
||||
if __name__ == "__main__": |
||||
raise SystemExit(main()) |
||||
@ -0,0 +1,366 @@ |
||||
#!/usr/bin/env python3 |
||||
"""Visualize SVD spatial drift over 10 annual windows. |
||||
|
||||
Two-panel figure: |
||||
Panel A: Full trajectory — individual party arrows over time |
||||
Panel B: Centrist vs right-wing center of gravity trajectories |
||||
|
||||
Usage: |
||||
uv run python analysis/right_wing/svd_trajectory_viz.py |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
import os |
||||
import sys |
||||
from pathlib import Path |
||||
from typing import Dict, List |
||||
|
||||
import matplotlib |
||||
import matplotlib.pyplot as plt |
||||
import numpy as np |
||||
|
||||
matplotlib.use("Agg") |
||||
|
||||
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||
if str(ROOT) not in sys.path: |
||||
sys.path.insert(0, str(ROOT)) |
||||
|
||||
from analysis.config import CANONICAL_RIGHT, PARTY_COLOURS, _PARTY_NORMALIZE |
||||
from analysis.explorer_data import ( |
||||
get_uniform_dim_windows, |
||||
load_party_scores_all_windows_aligned, |
||||
) |
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||
logger = logging.getLogger("svd_trajectory_viz") |
||||
|
||||
CANONICAL_CENTRIST = frozenset( |
||||
{"VVD", "D66", "CDA", "NSC", "BBB", "CU", "ChristenUnie"} |
||||
) |
||||
|
||||
DB_PATH = str(ROOT / "data" / "motions.db") |
||||
REPORTS_DIR = ROOT / "reports" / "overton_window" |
||||
OUTPUT_PATH = str(REPORTS_DIR / "svd_trajectory_figure.png") |
||||
|
||||
CENTRIST_DISPLAY = ["VVD", "D66", "CDA", "NSC", "BBB", "CU"] |
||||
RIGHT_DISPLAY = ["PVV", "FVD", "JA21", "SGP"] |
||||
|
||||
|
||||
def _normalize_party(raw: str) -> str: |
||||
return _PARTY_NORMALIZE.get(raw, raw) |
||||
|
||||
|
||||
def _party_in_set(party: str, canonical_set: frozenset) -> bool: |
||||
if party in canonical_set: |
||||
return True |
||||
normalized = _normalize_party(party) |
||||
return normalized != party and normalized in canonical_set |
||||
|
||||
|
||||
def _build_trajectories( |
||||
scores: Dict[str, List[List[float]]], |
||||
windows: List[str], |
||||
) -> Dict[str, Dict[str, List[float | None]]]: |
||||
"""Build per-party (x, y) lists aligned with windows. |
||||
|
||||
Returns {party: {"x": [...], "y": [...], "windows": [...]}} |
||||
where each list has one entry per window (None if party missing). |
||||
""" |
||||
n_windows = len(windows) |
||||
result: Dict[str, Dict[str, List[float | None]]] = {} |
||||
|
||||
for party, window_scores in scores.items(): |
||||
xs: List[float | None] = [] |
||||
ys: List[float | None] = [] |
||||
valid_windows: List[str] = [] |
||||
for idx in range(n_windows): |
||||
if idx < len(window_scores): |
||||
xs.append(window_scores[idx][0]) |
||||
ys.append(window_scores[idx][1]) |
||||
valid_windows.append(windows[idx]) |
||||
else: |
||||
xs.append(None) |
||||
ys.append(None) |
||||
result[party] = {"x": xs, "y": ys, "windows": valid_windows} |
||||
|
||||
return result |
||||
|
||||
|
||||
def _compute_group_center( |
||||
trajectories: Dict[str, Dict[str, List[float | None]]], |
||||
party_set: frozenset, |
||||
n_windows: int, |
||||
) -> Dict[str, List[float | None]]: |
||||
"""Compute mean (x, y) per window across a set of parties.""" |
||||
xs: List[float | None] = [] |
||||
ys: List[float | None] = [] |
||||
for w_idx in range(n_windows): |
||||
vals_x = [] |
||||
vals_y = [] |
||||
for party, traj in trajectories.items(): |
||||
if not _party_in_set(party, party_set): |
||||
continue |
||||
if w_idx < len(traj["x"]) and traj["x"][w_idx] is not None: |
||||
vals_x.append(traj["x"][w_idx]) |
||||
vals_y.append(traj["y"][w_idx]) |
||||
if vals_x: |
||||
xs.append(float(np.mean(vals_x))) |
||||
ys.append(float(np.mean(vals_y))) |
||||
else: |
||||
xs.append(None) |
||||
ys.append(None) |
||||
return {"x": xs, "y": ys} |
||||
|
||||
|
||||
def _plot_party_trajectory( |
||||
ax: plt.Axes, |
||||
traj: Dict[str, List[float | None]], |
||||
windows: List[str], |
||||
party: str, |
||||
colour: str, |
||||
) -> None: |
||||
"""Plot a single party's trajectory with arrows and year labels.""" |
||||
x_vals = traj["x"] |
||||
y_vals = traj["y"] |
||||
|
||||
valid_indices = [ |
||||
i for i in range(len(x_vals)) if x_vals[i] is not None and y_vals[i] is not None |
||||
] |
||||
if len(valid_indices) < 2: |
||||
return |
||||
|
||||
valid_x = [x_vals[i] for i in valid_indices] |
||||
valid_y = [y_vals[i] for i in valid_indices] |
||||
valid_w = [windows[i] for i in valid_indices] |
||||
|
||||
ax.plot(valid_x, valid_y, "-", color=colour, linewidth=1.2, alpha=0.5, zorder=1) |
||||
|
||||
for i in range(len(valid_x) - 1): |
||||
ax.annotate( |
||||
"", |
||||
xy=(valid_x[i + 1], valid_y[i + 1]), |
||||
xytext=(valid_x[i], valid_y[i]), |
||||
arrowprops=dict( |
||||
arrowstyle="->", |
||||
color=colour, |
||||
lw=1.0, |
||||
alpha=0.5, |
||||
shrinkA=4, |
||||
shrinkB=4, |
||||
), |
||||
zorder=2, |
||||
) |
||||
|
||||
ax.scatter(valid_x, valid_y, color=colour, s=25, zorder=3, label=party) |
||||
|
||||
first_x, first_y = valid_x[0], valid_y[0] |
||||
ax.annotate( |
||||
valid_w[0], |
||||
(first_x, first_y), |
||||
textcoords="offset points", |
||||
xytext=(6, -10), |
||||
fontsize=6, |
||||
color=colour, |
||||
fontweight="bold", |
||||
alpha=0.8, |
||||
) |
||||
|
||||
last_x, last_y = valid_x[-1], valid_y[-1] |
||||
ax.annotate( |
||||
valid_w[-1], |
||||
(last_x, last_y), |
||||
textcoords="offset points", |
||||
xytext=(6, 6), |
||||
fontsize=6, |
||||
color=colour, |
||||
fontweight="bold", |
||||
alpha=0.8, |
||||
) |
||||
|
||||
|
||||
def main() -> None: |
||||
os.makedirs(str(REPORTS_DIR), exist_ok=True) |
||||
|
||||
logger.info("Loading aligned party positions...") |
||||
windows = get_uniform_dim_windows(DB_PATH) |
||||
if not windows: |
||||
logger.error("No uniform-dim windows found") |
||||
return |
||||
|
||||
scores = load_party_scores_all_windows_aligned(DB_PATH) |
||||
if not scores: |
||||
logger.error("No aligned party scores loaded") |
||||
return |
||||
|
||||
logger.info("Windows: %s", windows) |
||||
logger.info("Parties: %s", sorted(scores.keys())) |
||||
|
||||
trajectories = _build_trajectories(scores, windows) |
||||
n_windows = len(windows) |
||||
|
||||
centrist_center = _compute_group_center( |
||||
trajectories, CANONICAL_CENTRIST, n_windows |
||||
) |
||||
right_center = _compute_group_center( |
||||
trajectories, CANONICAL_RIGHT, n_windows |
||||
) |
||||
|
||||
fig, (ax_a, ax_b) = plt.subplots(1, 2, figsize=(18, 8)) |
||||
|
||||
# ── Panel A: Full individual party trajectories ────────────────────── |
||||
for party in CENTRIST_DISPLAY: |
||||
if party not in trajectories: |
||||
continue |
||||
colour = PARTY_COLOURS.get(party, "#888888") |
||||
_plot_party_trajectory(ax_a, trajectories[party], windows, party, colour) |
||||
|
||||
for party in RIGHT_DISPLAY: |
||||
if party not in trajectories: |
||||
continue |
||||
colour = PARTY_COLOURS.get(party, "#888888") |
||||
_plot_party_trajectory(ax_a, trajectories[party], windows, party, colour) |
||||
|
||||
ax_a.axhline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||
ax_a.axvline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||
ax_a.set_xlabel("PCA Axis 1 (Procrustes-aligned)") |
||||
ax_a.set_ylabel("PCA Axis 2 (Procrustes-aligned)") |
||||
ax_a.set_title("Panel A: Party Trajectories (All Windows)", fontsize=11) |
||||
ax_a.set_aspect("equal", adjustable="datalim") |
||||
ax_a.grid(True, alpha=0.2) |
||||
ax_a.legend(loc="upper left", fontsize=7, framealpha=0.85) |
||||
|
||||
# ── Panel B: Centrist vs right-wing center of gravity ──────────────── |
||||
cent_valid_idx = [ |
||||
i |
||||
for i in range(n_windows) |
||||
if centrist_center["x"][i] is not None and centrist_center["y"][i] is not None |
||||
] |
||||
right_valid_idx = [ |
||||
i |
||||
for i in range(n_windows) |
||||
if right_center["x"][i] is not None and right_center["y"][i] is not None |
||||
] |
||||
|
||||
if cent_valid_idx: |
||||
cent_x = [centrist_center["x"][i] for i in cent_valid_idx] |
||||
cent_y = [centrist_center["y"][i] for i in cent_valid_idx] |
||||
cent_w = [windows[i] for i in cent_valid_idx] |
||||
|
||||
ax_b.plot( |
||||
cent_x, cent_y, "o-", color="#1E73BE", linewidth=2, markersize=7, |
||||
label="Centrist center (VVD, D66, CDA, NSC, BBB, CU)", zorder=3, |
||||
) |
||||
for i in range(len(cent_x) - 1): |
||||
ax_b.annotate( |
||||
"", |
||||
xy=(cent_x[i + 1], cent_y[i + 1]), |
||||
xytext=(cent_x[i], cent_y[i]), |
||||
arrowprops=dict( |
||||
arrowstyle="->", color="#1E73BE", lw=1.5, alpha=0.6, |
||||
), |
||||
zorder=2, |
||||
) |
||||
for i, label in enumerate(cent_w): |
||||
ax_b.annotate( |
||||
str(label), |
||||
(cent_x[i], cent_y[i]), |
||||
textcoords="offset points", |
||||
xytext=(6, 6), |
||||
fontsize=7, |
||||
color="#1E73BE", |
||||
fontweight="bold", |
||||
) |
||||
|
||||
if right_valid_idx: |
||||
right_x = [right_center["x"][i] for i in right_valid_idx] |
||||
right_y = [right_center["y"][i] for i in right_valid_idx] |
||||
right_w = [windows[i] for i in right_valid_idx] |
||||
|
||||
ax_b.plot( |
||||
right_x, right_y, "s--", color="#6A1B9A", linewidth=1.5, |
||||
markersize=6, alpha=0.8, |
||||
label="Right-wing center (PVV, FVD, JA21, SGP)", zorder=3, |
||||
) |
||||
for i in range(len(right_x) - 1): |
||||
ax_b.annotate( |
||||
"", |
||||
xy=(right_x[i + 1], right_y[i + 1]), |
||||
xytext=(right_x[i], right_y[i]), |
||||
arrowprops=dict( |
||||
arrowstyle="->", color="#6A1B9A", lw=1.2, alpha=0.5, |
||||
), |
||||
zorder=2, |
||||
) |
||||
for i, label in enumerate(right_w): |
||||
ax_b.annotate( |
||||
str(label), |
||||
(right_x[i], right_y[i]), |
||||
textcoords="offset points", |
||||
xytext=(6, -10), |
||||
fontsize=7, |
||||
color="#6A1B9A", |
||||
fontweight="bold", |
||||
) |
||||
|
||||
ax_b.axhline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||
ax_b.axvline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||
ax_b.set_xlabel("PCA Axis 1 (Procrustes-aligned)") |
||||
ax_b.set_ylabel("PCA Axis 2 (Procrustes-aligned)") |
||||
ax_b.set_title("Panel B: Group Center of Gravity Trajectories", fontsize=11) |
||||
ax_b.set_aspect("equal", adjustable="datalim") |
||||
ax_b.grid(True, alpha=0.2) |
||||
ax_b.legend(loc="upper left", fontsize=7, framealpha=0.85) |
||||
|
||||
fig.suptitle( |
||||
"SVD Spatial Drift: 10-Year Parliamentary Party Trajectories", |
||||
fontsize=13, |
||||
fontweight="bold", |
||||
) |
||||
fig.tight_layout(rect=[0, 0, 1, 0.96]) |
||||
fig.savefig(OUTPUT_PATH, dpi=150, bbox_inches="tight", facecolor="white") |
||||
plt.close(fig) |
||||
|
||||
logger.info("Figure saved to %s", OUTPUT_PATH) |
||||
|
||||
cent_start = ( |
||||
(centrist_center["x"][cent_valid_idx[0]], centrist_center["y"][cent_valid_idx[0]]) |
||||
if cent_valid_idx |
||||
else (None, None) |
||||
) |
||||
cent_end = ( |
||||
(centrist_center["x"][cent_valid_idx[-1]], centrist_center["y"][cent_valid_idx[-1]]) |
||||
if cent_valid_idx |
||||
else (None, None) |
||||
) |
||||
right_start = ( |
||||
(right_center["x"][right_valid_idx[0]], right_center["y"][right_valid_idx[0]]) |
||||
if right_valid_idx |
||||
else (None, None) |
||||
) |
||||
right_end = ( |
||||
(right_center["x"][right_valid_idx[-1]], right_center["y"][right_valid_idx[-1]]) |
||||
if right_valid_idx |
||||
else (None, None) |
||||
) |
||||
|
||||
if cent_start[0] is not None and cent_end[0] is not None: |
||||
dx = cent_end[0] - cent_start[0] |
||||
dy = cent_end[1] - cent_start[1] |
||||
logger.info( |
||||
"Centrist center drift: dx=%.4f dy=%.4f net=%.4f", |
||||
dx, dy, float(np.sqrt(dx**2 + dy**2)), |
||||
) |
||||
|
||||
if right_start[0] is not None and right_end[0] is not None: |
||||
dx = right_end[0] - right_start[0] |
||||
dy = right_end[1] - right_start[1] |
||||
logger.info( |
||||
"Right-wing center drift: dx=%.4f dy=%.4f net=%.4f", |
||||
dx, dy, float(np.sqrt(dx**2 + dy**2)), |
||||
) |
||||
|
||||
|
||||
if __name__ == "__main__": |
||||
main() |
||||
@ -0,0 +1,673 @@ |
||||
#!/usr/bin/env python3 |
||||
"""U3: Replace binary pass/fail with continuous voting margin as the primary success metric. |
||||
|
||||
For each right-wing motion, compute the voting margin from per-party vote counts: |
||||
margin = (voor - tegen) / (voor + tegen + afwezig) |
||||
|
||||
This gives a continuous [-1, 1] scale where: |
||||
+1.0 = unanimous support (all parties voted voor) |
||||
0.0 = exactly tied or no votes |
||||
-1.0 = unanimous opposition (all parties voted tegen) |
||||
|
||||
Usage: |
||||
uv run python -m analysis.right_wing.voting_margin |
||||
|
||||
Output: |
||||
reports/overton_window/voting_margin.md |
||||
reports/overton_window/voting_margin_figure.png |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import json |
||||
import logging |
||||
import sys |
||||
from pathlib import Path |
||||
from typing import Any |
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent |
||||
if str(PROJECT_ROOT) not in sys.path: |
||||
sys.path.insert(0, str(PROJECT_ROOT)) |
||||
|
||||
import duckdb |
||||
import matplotlib |
||||
|
||||
matplotlib.use("Agg") |
||||
import matplotlib.pyplot as plt |
||||
import numpy as np |
||||
from scipy.stats import spearmanr, pearsonr, mannwhitneyu |
||||
|
||||
from analysis.config import CANONICAL_RIGHT |
||||
|
||||
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||
logger = logging.getLogger(__name__) |
||||
|
||||
DB_PATH = str(PROJECT_ROOT / "data" / "motions.db") |
||||
REPORTS_DIR = PROJECT_ROOT / "reports" / "overton_window" |
||||
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||
|
||||
BREAK_YEAR = 2024 |
||||
|
||||
QUARTILE_LABELS = [ |
||||
"Q1 [0.00\u20130.25]", |
||||
"Q2 (0.25\u20130.50]", |
||||
"Q3 (0.50\u20130.75]", |
||||
"Q4 (0.75\u20131.00]", |
||||
] |
||||
|
||||
|
||||
def quartile_bin(cs: float) -> int: |
||||
if cs <= 0.25: |
||||
return 0 |
||||
elif cs <= 0.50: |
||||
return 1 |
||||
elif cs <= 0.75: |
||||
return 2 |
||||
else: |
||||
return 3 |
||||
|
||||
|
||||
def compute_margin(voting: dict[str, str]) -> float | None: |
||||
"""Compute voting margin from per-party vote directions. |
||||
|
||||
voting: {party_name: "voor"/"tegen"/"afwezig"} |
||||
Returns margin in [-1, 1] or None if no votes. |
||||
""" |
||||
voor = sum(1 for v in voting.values() if v == "voor") |
||||
tegen = sum(1 for v in voting.values() if v == "tegen") |
||||
afwezig = sum(1 for v in voting.values() if v == "afwezig") |
||||
denom = voor + tegen + afwezig |
||||
if denom == 0: |
||||
return None |
||||
return (voor - tegen) / denom |
||||
|
||||
|
||||
def motion_passed(margin: float | None) -> bool | None: |
||||
"""Determine pass/fail from margin.""" |
||||
if margin is None: |
||||
return None |
||||
return margin > 0 |
||||
|
||||
|
||||
def collect_motion_margins( |
||||
con: duckdb.DuckDBPyConnection, |
||||
) -> list[dict[str, Any]]: |
||||
rows = con.execute(""" |
||||
SELECT |
||||
r.motion_id, |
||||
r.year, |
||||
r.centrist_support_strict, |
||||
m.voting_results |
||||
FROM right_wing_motions r |
||||
JOIN motions m ON r.motion_id = m.id |
||||
WHERE r.classified = TRUE |
||||
AND r.year IS NOT NULL |
||||
AND r.centrist_support_strict IS NOT NULL |
||||
""").fetchall() |
||||
|
||||
motions: list[dict[str, Any]] = [] |
||||
for mid, year, cs, vr_json in rows: |
||||
voting = json.loads(vr_json) if isinstance(vr_json, str) else (vr_json or {}) |
||||
margin = compute_margin(voting) |
||||
if margin is None: |
||||
continue |
||||
passed = motion_passed(margin) |
||||
motions.append({ |
||||
"motion_id": mid, |
||||
"year": int(year), |
||||
"centrist_support_strict": float(cs), |
||||
"margin": margin, |
||||
"passed": passed, |
||||
"period": "post-2024" if int(year) >= BREAK_YEAR else "pre-2024", |
||||
}) |
||||
return motions |
||||
|
||||
|
||||
def quartile_margin_stats( |
||||
motions: list[dict], filter_fn=None |
||||
) -> dict: |
||||
if filter_fn is None: |
||||
strata = { |
||||
"all": lambda m: True, |
||||
"pre-2024": lambda m: m["period"] == "pre-2024", |
||||
"post-2024": lambda m: m["period"] == "post-2024", |
||||
} |
||||
else: |
||||
strata = {"filtered": filter_fn} |
||||
|
||||
result: dict[str, dict[int, dict]] = {} |
||||
for label, fn in strata.items(): |
||||
bins: dict[int, dict] = {q: {"margins": [], "n": 0} for q in range(4)} |
||||
for m in motions: |
||||
if not fn(m): |
||||
continue |
||||
q = quartile_bin(m["centrist_support_strict"]) |
||||
bins[q]["margins"].append(m["margin"]) |
||||
bins[q]["n"] += 1 |
||||
|
||||
for q in range(4): |
||||
d = bins[q] |
||||
margins_arr = np.array(d["margins"]) |
||||
d["mean"] = float(np.mean(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||
d["median"] = float(np.median(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||
d["std"] = float(np.std(margins_arr, ddof=1)) if len(margins_arr) > 1 else float("nan") |
||||
d["p25"] = float(np.percentile(margins_arr, 25)) if len(margins_arr) > 0 else float("nan") |
||||
d["p75"] = float(np.percentile(margins_arr, 75)) if len(margins_arr) > 0 else float("nan") |
||||
d["min"] = float(np.min(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||
d["max"] = float(np.max(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||
d["margin"] = d["margins"] |
||||
del d["margins"] |
||||
|
||||
result[label] = bins |
||||
|
||||
return result |
||||
|
||||
|
||||
def spearman_correlation(motions: list[dict]) -> dict[str, Any]: |
||||
margins = np.array([m["margin"] for m in motions]) |
||||
cs_vals = np.array([m["centrist_support_strict"] for m in motions]) |
||||
rho, p = spearmanr(margins, cs_vals) |
||||
r, pr = pearsonr(margins, cs_vals) |
||||
return {"spearman_rho": float(rho), "spearman_p": float(p), "pearson_r": float(r), "pearson_p": float(pr)} |
||||
|
||||
|
||||
def create_figure( |
||||
all_strata: dict[str, dict[int, dict]], |
||||
motions: list[dict], |
||||
corr: dict[str, Any], |
||||
) -> str: |
||||
fig, (ax_a, ax_b, ax_c) = plt.subplots(1, 3, figsize=(18, 6)) |
||||
|
||||
# --- Panel A: Box plots of margin by centrist support quartile --- |
||||
all_bins = all_strata["all"] |
||||
quartile_data = [all_bins[q]["margin"] for q in range(4)] |
||||
quartile_ns = [all_bins[q]["n"] for q in range(4)] |
||||
|
||||
bp = ax_a.boxplot( |
||||
quartile_data, |
||||
positions=range(4), |
||||
widths=0.5, |
||||
patch_artist=True, |
||||
showfliers=True, |
||||
flierprops=dict(marker="o", markersize=3, alpha=0.4), |
||||
) |
||||
box_colours = ["#E0E0E0", "#BDBDBD", "#9E9E9E", "#616161"] |
||||
for patch, color in zip(bp["boxes"], box_colours): |
||||
patch.set_facecolor(color) |
||||
patch.set_alpha(0.8) |
||||
|
||||
for q in range(4): |
||||
mean_val = all_bins[q]["mean"] |
||||
if not np.isnan(mean_val): |
||||
ax_a.scatter(q, mean_val, marker="D", color="#D32F2F", s=40, zorder=5, |
||||
label="Mean" if q == 0 else None) |
||||
|
||||
ax_a.set_xticks(range(4)) |
||||
ax_a.set_xticklabels([f"Q{q+1}\n(n={quartile_ns[q]})" for q in range(4)], fontsize=9) |
||||
ax_a.set_ylabel("Voting margin (party-level)") |
||||
ax_a.set_title("A. Margin by centrist support quartile", fontweight="bold") |
||||
ax_a.set_ylim(-1.05, 1.05) |
||||
ax_a.axhline(y=0, color="grey", linestyle="--", alpha=0.5, linewidth=0.8) |
||||
ax_a.legend(fontsize=7, loc="upper left") |
||||
ax_a.grid(True, alpha=0.3, axis="y") |
||||
|
||||
# --- Panel B: Margin over time (yearly mean) --- |
||||
years_data: dict[int, list[float]] = {} |
||||
for m in motions: |
||||
y = m["year"] |
||||
years_data.setdefault(y, []).append(m["margin"]) |
||||
|
||||
years_sorted = sorted(years_data.keys()) |
||||
yearly_means = np.array([np.mean(years_data[y]) for y in years_sorted]) |
||||
yearly_stds = np.array([np.std(years_data[y], ddof=1) for y in years_sorted]) |
||||
yearly_ns = np.array([len(years_data[y]) for y in years_sorted]) |
||||
yearly_sems = yearly_stds / np.sqrt(yearly_ns) |
||||
|
||||
ax_b.fill_between(years_sorted, yearly_means - 1.96 * yearly_sems, |
||||
yearly_means + 1.96 * yearly_sems, |
||||
alpha=0.2, color="#002366", label="95% CI") |
||||
ax_b.plot(years_sorted, yearly_means, marker="o", color="#002366", |
||||
linewidth=2, label="Mean margin") |
||||
ax_b.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||
ax_b.annotate("2024", xy=(BREAK_YEAR - 0.3, ax_b.get_ylim()[1] * 0.90), |
||||
fontsize=9, color="black", alpha=0.7) |
||||
ax_b.set_xlabel("Year") |
||||
ax_b.set_ylabel("Mean voting margin") |
||||
ax_b.set_title("B. Voting margin over time", fontweight="bold") |
||||
ax_b.legend(fontsize=8) |
||||
ax_b.grid(True, alpha=0.3) |
||||
ax_b.set_xticks(years_sorted) |
||||
ax_b.set_xticklabels([str(y) for y in years_sorted], rotation=45) |
||||
|
||||
# --- Panel C: Scatter of margin vs centrist support --- |
||||
margins_arr = np.array([m["margin"] for m in motions]) |
||||
cs_arr = np.array([m["centrist_support_strict"] for m in motions]) |
||||
pre_mask = np.array([m["period"] == "pre-2024" for m in motions]) |
||||
post_mask = ~pre_mask |
||||
|
||||
ax_c.scatter(cs_arr[pre_mask], margins_arr[pre_mask], |
||||
alpha=0.35, s=12, color="#90CAF9", label="Pre-2024", edgecolors="none") |
||||
ax_c.scatter(cs_arr[post_mask], margins_arr[post_mask], |
||||
alpha=0.35, s=12, color="#1E88E5", label="Post-2024", edgecolors="none") |
||||
|
||||
valid = ~np.isnan(cs_arr) & ~np.isnan(margins_arr) |
||||
if valid.sum() > 1: |
||||
coeffs = np.polyfit(cs_arr[valid], margins_arr[valid], 1) |
||||
x_fit = np.linspace(0, 1, 100) |
||||
ax_c.plot(x_fit, np.polyval(coeffs, x_fit), color="#D32F2F", linewidth=1.5, |
||||
linestyle="--", label=f"Linear fit (r={corr['pearson_r']:.3f})") |
||||
|
||||
ax_c.set_xlabel("Centrist support (strict)") |
||||
ax_c.set_ylabel("Voting margin") |
||||
ax_c.set_title(f"C. Margin vs centrist support\nSpearman \u03c1={corr['spearman_rho']:.3f}, p={corr['spearman_p']:.1e}", |
||||
fontweight="bold") |
||||
ax_c.set_ylim(-1.05, 1.05) |
||||
ax_c.set_xlim(-0.02, 1.02) |
||||
ax_c.axhline(y=0, color="grey", linestyle="--", alpha=0.5, linewidth=0.8) |
||||
ax_c.legend(fontsize=8, loc="upper left") |
||||
ax_c.grid(True, alpha=0.3) |
||||
|
||||
plt.tight_layout() |
||||
path = str(REPORTS_DIR / "voting_margin_figure.png") |
||||
fig.savefig(path, dpi=150, bbox_inches="tight") |
||||
plt.close(fig) |
||||
logger.info("Saved figure to %s", path) |
||||
return path |
||||
|
||||
|
||||
def generate_report( |
||||
all_strata: dict[str, dict[int, dict]], |
||||
motions: list[dict], |
||||
corr: dict[str, Any], |
||||
fig_path: str, |
||||
) -> str: |
||||
n_total = len(motions) |
||||
margins_arr = np.array([m["margin"] for m in motions]) |
||||
cs_arr = np.array([m["centrist_support_strict"] for m in motions]) |
||||
n_passed = sum(1 for m in motions if m["passed"]) |
||||
n_failed = sum(1 for m in motions if m["passed"] is False) |
||||
overall_pass_rate = n_passed / n_total if n_total > 0 else 0.0 |
||||
|
||||
# Quartile margin table |
||||
qtable = "| Stratum | " + " | ".join(QUARTILE_LABELS) + " |\n" |
||||
qtable += "|---------|" + "|".join([":------:" for _ in QUARTILE_LABELS]) + "|\n" |
||||
|
||||
for key in ["all", "pre-2024", "post-2024"]: |
||||
bins = all_strata.get(key, {}) |
||||
row = [key] |
||||
for q in range(4): |
||||
d = bins.get(q, {}) |
||||
m = d.get("mean", float("nan")) |
||||
n = d.get("n", 0) |
||||
if np.isnan(m): |
||||
row.append(f"N/A (n={n})") |
||||
else: |
||||
row.append(f"{m:+.3f} (n={n})") |
||||
qtable += "| " + " | ".join(row) + " |\n" |
||||
|
||||
# Quartile detailed stats table |
||||
qdetail = "| Quartile | N | Mean | Median | Std | P25 | P75 | Min | Max |\n" |
||||
qdetail += "|----------|---|------|--------|-----|-----|-----|-----|-----|\n" |
||||
for q in range(4): |
||||
d = all_strata["all"][q] |
||||
qdetail += ( |
||||
f"| Q{q+1} | {d['n']} | {d['mean']:+.3f} | {d['median']:+.3f} | " |
||||
f"{d['std']:.3f} | {d['p25']:+.3f} | {d['p75']:+.3f} | " |
||||
f"{d['min']:+.3f} | {d['max']:+.3f} |\n" |
||||
) |
||||
|
||||
# Period-level stats |
||||
pre_motions = [m for m in motions if m["period"] == "pre-2024"] |
||||
post_motions = [m for m in motions if m["period"] == "post-2024"] |
||||
pre_margins = np.array([m["margin"] for m in pre_motions]) |
||||
post_margins = np.array([m["margin"] for m in post_motions]) |
||||
|
||||
pre_mean = float(np.mean(pre_margins)) if len(pre_margins) > 0 else float("nan") |
||||
post_mean = float(np.mean(post_margins)) if len(post_margins) > 0 else float("nan") |
||||
delta = post_mean - pre_mean |
||||
|
||||
# Mann-Whitney for period difference |
||||
if len(pre_margins) > 0 and len(post_margins) > 0: |
||||
u_stat, u_p = mannwhitneyu(pre_margins, post_margins, alternative="two-sided") |
||||
u_str = f"U={u_stat:.0f}, p={u_p:.1e}" |
||||
cohens_d = (post_mean - pre_mean) / np.sqrt( |
||||
(np.std(pre_margins, ddof=1) ** 2 + np.std(post_margins, ddof=1) ** 2) / 2 |
||||
) if len(pre_margins) > 1 and len(post_margins) > 1 else float("nan") |
||||
else: |
||||
u_str = "N/A" |
||||
cohens_d = float("nan") |
||||
|
||||
# Yearly breakdown |
||||
years_data: dict[int, list[float]] = {} |
||||
years_cs: dict[int, list[float]] = {} |
||||
for m in motions: |
||||
y = m["year"] |
||||
years_data.setdefault(y, []).append(m["margin"]) |
||||
years_cs.setdefault(y, []).append(m["centrist_support_strict"]) |
||||
|
||||
ytable = "| Year | N | Mean Margin | Mean CS (strict) | % Passed |\n" |
||||
ytable += "|------|---|-------------|-----------------|---------|\n" |
||||
for y in sorted(years_data.keys()): |
||||
ym = years_data[y] |
||||
yc = years_cs[y] |
||||
passed = sum(1 for m in motions if m["year"] == y and m["passed"]) |
||||
total = len(ym) |
||||
ytable += ( |
||||
f"| {y} | {total} | {np.mean(ym):+.3f} | {np.mean(yc):.3f} | " |
||||
f"{passed/total:.1%} |\n" |
||||
) |
||||
|
||||
# Q4 vs Q1 gap (analogous to success premium) |
||||
q1_mean = all_strata["all"][0]["mean"] |
||||
q4_mean = all_strata["all"][3]["mean"] |
||||
margin_gap = q4_mean - q1_mean if not (np.isnan(q1_mean) or np.isnan(q4_mean)) else float("nan") |
||||
|
||||
# Pass rate by quartile for comparison |
||||
pass_table = "| Quartile | N | Pass Rate | Mean Margin |\n" |
||||
pass_table += "|----------|---|-----------|-------------|\n" |
||||
for q in range(4): |
||||
d = all_strata["all"][q] |
||||
q_motions = [m for m in motions if quartile_bin(m["centrist_support_strict"]) == q] |
||||
q_passed = sum(1 for m in q_motions if m["passed"]) |
||||
pr = q_passed / d["n"] if d["n"] > 0 else float("nan") |
||||
pr_str = f"{pr:.1%}" if not np.isnan(pr) else "N/A" |
||||
pass_table += f"| Q{q+1} | {d['n']} | {pr_str} | {d['mean']:+.3f} |\n" |
||||
|
||||
report = [ |
||||
"# Voting Margin Analysis", |
||||
"", |
||||
"**Goal:** Replace binary pass/fail with continuous voting margin as the primary", |
||||
"success metric for right-wing motions in the Tweede Kamer.", |
||||
"", |
||||
f"**Analysis period:** 2016\u20132026", |
||||
f"**Total right-wing motions with vote data:** {n_total}", |
||||
f"**Motions passed:** {n_passed} ({overall_pass_rate:.1%})", |
||||
f"**Motions failed:** {n_failed} ({n_failed/n_total:.1%})" if n_total > 0 else "", |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 1. Methodology", |
||||
"", |
||||
"The voting margin is computed from `motions.voting_results`, which stores", |
||||
"per-party vote directions as a JSON object:", |
||||
"`{\"PVV\": \"voor\", \"VVD\": \"tegen\", \"D66\": \"afwezig\", ...}`.", |
||||
"", |
||||
"```", |
||||
"margin = (voor - tegen) / (voor + tegen + afwezig)", |
||||
"```", |
||||
"", |
||||
"Each party contributes one vote (its majority position). The margin ranges", |
||||
"from -1 (unanimous rejection) to +1 (unanimous support). A margin of 0", |
||||
"indicates an exact tie or no participating parties.", |
||||
"", |
||||
"This continuous metric captures *magnitude* of support, not just direction.", |
||||
"A motion that passes 14-1 has margin = +0.87, while one that passes 8-7 has", |
||||
"margin = +0.07. Both are \"passed\" in binary terms, but the former has far", |
||||
"stronger parliamentary consensus.", |
||||
"", |
||||
"> **Note:** The per-party aggregation treats all parties equally, regardless of", |
||||
"> seat count. This is appropriate for measuring *breadth of support across the", |
||||
"> political spectrum*, which is exactly what the Overton window concept", |
||||
"> concerns. Seat-weighted margins would be confounded by coalition size effects.", |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 2. Correlation: Margin vs Centrist Support", |
||||
"", |
||||
"| Metric | Value |", |
||||
"|--------|-------|", |
||||
f"| Spearman \u03c1 | {corr['spearman_rho']:.3f} |", |
||||
f"| Spearman p-value | {corr['spearman_p']:.1e} |", |
||||
f"| Pearson r | {corr['pearson_r']:.3f} |", |
||||
f"| Pearson p-value | {corr['pearson_p']:.1e} |", |
||||
"", |
||||
] |
||||
|
||||
if corr["spearman_p"] < 0.05: |
||||
report.append( |
||||
f"The Spearman correlation is significant (\u03c1 = {corr['spearman_rho']:.3f}, " |
||||
f"p = {corr['spearman_p']:.1e}), indicating a " |
||||
f"{'positive' if corr['spearman_rho'] > 0 else 'negative'} monotonic " |
||||
f"relationship between centrist support and voting margin." |
||||
) |
||||
else: |
||||
report.append( |
||||
f"The Spearman correlation is not significant (\u03c1 = {corr['spearman_rho']:.3f}, " |
||||
f"p = {corr['spearman_p']:.3f}). Centrist support alone does not predict " |
||||
f"voting margin." |
||||
) |
||||
|
||||
report += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 3. Margin Distribution by Centrist Support Quartile", |
||||
"", |
||||
"### Summary Table", |
||||
"", |
||||
qtable, |
||||
"", |
||||
"### Detailed Statistics (All Motions)", |
||||
"", |
||||
qdetail, |
||||
"", |
||||
f"**Q4 \u2013 Q1 gap in mean margin:** {margin_gap:+.3f}", |
||||
"", |
||||
] |
||||
|
||||
if not np.isnan(margin_gap) and margin_gap > 0: |
||||
report.append( |
||||
f"The gap of {margin_gap:+.3f} indicates that motions with the highest " |
||||
f"centrist support (Q4) have a meaningfully higher voting margin than " |
||||
f"those with the lowest (Q1)." |
||||
) |
||||
elif not np.isnan(margin_gap): |
||||
report.append( |
||||
f"The gap of {margin_gap:+.3f} shows no meaningful positive relationship " |
||||
f"between centrist support and voting margin." |
||||
) |
||||
|
||||
report += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 4. Pass Rate vs Margin Comparison", |
||||
"", |
||||
"This section compares the binary pass-rate metric with the continuous margin", |
||||
"metric to determine whether margin captures additional information.", |
||||
"", |
||||
pass_table, |
||||
"", |
||||
] |
||||
|
||||
# Check if margin detects patterns pass rate misses |
||||
q1_pr = 0.0 |
||||
q4_pr = 0.0 |
||||
for q in range(4): |
||||
d = all_strata["all"][q] |
||||
q_motions = [m for m in motions if quartile_bin(m["centrist_support_strict"]) == q] |
||||
q_passed = sum(1 for m in q_motions if m["passed"]) |
||||
pr = q_passed / d["n"] if d["n"] > 0 else 0.0 |
||||
if q == 0: |
||||
q1_pr = pr |
||||
elif q == 3: |
||||
q4_pr = pr |
||||
|
||||
pass_gap = q4_pr - q1_pr if q4_pr > 0 else 0.0 |
||||
|
||||
report.append( |
||||
f"**Pass rate gap (Q4 \u2013 Q1):** {pass_gap:+.1%}" |
||||
) |
||||
report.append( |
||||
f"**Margin gap (Q4 \u2013 Q1):** {margin_gap:+.3f}" |
||||
) |
||||
|
||||
if pass_gap < 0.05 and abs(margin_gap) > 0.05: |
||||
report.append("") |
||||
report.append( |
||||
"The pass rate gap is small ({:.1%}) while the margin gap is meaningful " |
||||
"({:+.3f}), suggesting that **margin captures variance that the binary " |
||||
"pass/fail metric misses**. This supports replacing pass rate with voting " |
||||
"margin as the primary success metric.".format(pass_gap, margin_gap) |
||||
) |
||||
elif pass_gap >= 0.05: |
||||
report.append("") |
||||
report.append( |
||||
"Both pass rate and margin show a positive relationship with centrist " |
||||
"support. Margin provides additional granularity but does not contradict " |
||||
"the pass rate findings." |
||||
) |
||||
else: |
||||
report.append("") |
||||
report.append( |
||||
"Neither pass rate nor margin show a meaningful relationship with centrist " |
||||
"support. The high baseline pass rate (~{:.0%}) creates a ceiling effect " |
||||
"for both metrics.".format(overall_pass_rate) |
||||
) |
||||
|
||||
report += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 5. Period Stratification", |
||||
"", |
||||
"| Metric | Pre-2024 | Post-2024 | \u0394 |", |
||||
"|--------|----------|-----------|-----|", |
||||
f"| N | {len(pre_motions)} | {len(post_motions)} | |", |
||||
f"| Mean margin | {pre_mean:+.3f} | {post_mean:+.3f} | {delta:+.3f} |", |
||||
f"| Mann-Whitney U | | | {u_str} |", |
||||
f"| Cohen's d | | | {cohens_d:+.3f} |" if not np.isnan(cohens_d) else "", |
||||
"", |
||||
] |
||||
|
||||
if u_p < 0.05 if isinstance(u_p := corr.get("spearman_p", 1.0), float) else False: |
||||
pass |
||||
else: |
||||
if not np.isnan(post_mean) and not np.isnan(pre_mean): |
||||
_, period_p = mannwhitneyu(pre_margins, post_margins, alternative="two-sided") |
||||
if period_p < 0.05: |
||||
direction = "rose" if post_mean > pre_mean else "fell" |
||||
report.append( |
||||
f"Voting margin {direction} significantly post-2024 " |
||||
f"(Mann-Whitney p = {period_p:.1e}, d = {cohens_d:+.3f})." |
||||
) |
||||
else: |
||||
report.append( |
||||
f"Voting margin did not change significantly between periods " |
||||
f"(Mann-Whitney p = {period_p:.3f})." |
||||
) |
||||
|
||||
report += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 6. Yearly Breakdown", |
||||
"", |
||||
ytable, |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 7. Interpretation", |
||||
"", |
||||
] |
||||
|
||||
if corr["spearman_p"] < 0.05 and corr["spearman_rho"] > 0: |
||||
report.append( |
||||
f"**Finding:** Higher centrist support is associated with higher voting " |
||||
f"margins (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.1e}). " |
||||
f"This validates centrist support as a predictor of parliamentary success " |
||||
f"on a continuous scale, not just a binary pass/fail threshold." |
||||
) |
||||
elif corr["spearman_p"] < 0.05: |
||||
report.append( |
||||
f"**Finding:** Higher centrist support is associated with *lower* voting " |
||||
f"margins (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.1e}). " |
||||
f"This is counterintuitive and warrants further investigation." |
||||
) |
||||
else: |
||||
report.append( |
||||
f"**Finding:** No significant correlation between centrist support and " |
||||
f"voting margin (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.3f}). " |
||||
) |
||||
|
||||
report.append("") |
||||
report.append( |
||||
"**Margin vs pass rate:** The voting margin provides strictly more information " |
||||
"than the binary pass rate. Every pass/fail outcome can be derived from the " |
||||
"margin (margin > 0 = passed), but the margin also captures the *strength* of " |
||||
"parliamentary consensus. This is particularly important in the Tweede Kamer " |
||||
"where >95% of motions pass, making pass rate a nearly constant measure." |
||||
) |
||||
|
||||
report += [ |
||||
"", |
||||
"---", |
||||
"", |
||||
"## 8. Limitations", |
||||
"", |
||||
"- **Per-party aggregation:** All parties are weighted equally regardless of", |
||||
" seat count. A motion passing with VVD (24 seats) + PVV (37 seats) has the", |
||||
" same margin as one passing with SGP (3 seats) + DENK (3 seats). This is", |
||||
" appropriate for measuring *breadth of cross-spectrum support* but may not", |
||||
" reflect actual parliamentary power.", |
||||
"- **Voting discipline:** Party-line voting is near-universal in the Dutch", |
||||
" parliament. The per-party aggregation loses little information.", |
||||
"- **No within-party splits:** The voting_results data shows majority party", |
||||
" positions, not individual MP votes. Intra-party dissent is invisible.", |
||||
"- **Missing data:** Motions without voting_results are excluded.", |
||||
"", |
||||
"---", |
||||
"", |
||||
f".name})", |
||||
"", |
||||
"*Report generated by `analysis/right_wing/voting_margin.py`*", |
||||
] |
||||
|
||||
report_path = REPORTS_DIR / "voting_margin.md" |
||||
with open(report_path, "w") as f: |
||||
f.write("\n".join(report)) |
||||
logger.info("Report written to %s", report_path) |
||||
return str(report_path) |
||||
|
||||
|
||||
def main() -> int: |
||||
logger.info("Connecting to database: %s", DB_PATH) |
||||
con = duckdb.connect(DB_PATH, read_only=True) |
||||
|
||||
logger.info("Collecting motion margins...") |
||||
motions = collect_motion_margins(con) |
||||
con.close() |
||||
|
||||
n_total = len(motions) |
||||
n_passed = sum(1 for m in motions if m["passed"]) |
||||
n_pre = sum(1 for m in motions if m["period"] == "pre-2024") |
||||
n_post = sum(1 for m in motions if m["period"] == "post-2024") |
||||
|
||||
logger.info( |
||||
"Total: %d motions with voting data, %d passed (%.1f%%), pre=%d post=%d", |
||||
n_total, n_passed, (n_passed / n_total * 100) if n_total > 0 else 0, |
||||
n_pre, n_post, |
||||
) |
||||
|
||||
all_strata = quartile_margin_stats(motions) |
||||
corr = spearman_correlation(motions) |
||||
|
||||
logger.info( |
||||
"Spearman rho=%.3f p=%.1e | Pearson r=%.3f p=%.1e", |
||||
corr["spearman_rho"], corr["spearman_p"], |
||||
corr["pearson_r"], corr["pearson_p"], |
||||
) |
||||
|
||||
logger.info("Generating figure...") |
||||
fig_path = create_figure(all_strata, motions, corr) |
||||
|
||||
logger.info("Generating report...") |
||||
report_path = generate_report(all_strata, motions, corr, fig_path) |
||||
|
||||
print(f"\nReport: {report_path}") |
||||
print(f"Figure: {fig_path}") |
||||
return 0 |
||||
|
||||
|
||||
if __name__ == "__main__": |
||||
raise SystemExit(main()) |
||||
@ -0,0 +1,188 @@ |
||||
# Mechanism Classification Validation Report |
||||
|
||||
## 1. Inter-Rater Reliability |
||||
|
||||
- **Motions compared:** 200 |
||||
- **Agreements:** 101 / 200 |
||||
- **Agreement rate:** 50.5% |
||||
- **Cohen's kappa (κ):** 0.4082 |
||||
- P_o (observed): 0.5050 |
||||
- P_e (expected): 0.1636 |
||||
|
||||
**Interpretation:** Moderate agreement |
||||
|
||||
**The mechanism taxonomy needs revision.** The inter-rater agreement is below 0.6, suggesting the 10-mechanism framework is not being applied consistently across raters. Consider: |
||||
- Simplifying or merging ambiguous mechanism pairs |
||||
- Adding clearer decision rules for borderline cases |
||||
- Reducing the number of mechanisms |
||||
|
||||
## 2. Second Classifier Summary |
||||
|
||||
- **Model:** qwen/qwen-2.5-72b-instruct |
||||
- **Motions classified:** 200 |
||||
- **Average confidence:** 4.1/5 |
||||
|
||||
### Confidence Distribution |
||||
| Confidence | Count | |
||||
|------------|-------| |
||||
| 1 | 0 | |
||||
| 2 | 0 | |
||||
| 3 | 5 | |
||||
| 4 | 165 | |
||||
| 5 | 30 | |
||||
|
||||
## 3. Disagreement Table |
||||
|
||||
**Total disagreements:** 99 / 200 (49.5%) |
||||
|
||||
| Motion ID | Title | Original | Second | Confidence | Resolved | Winner | |
||||
|-----------|-------|----------|--------|------------|----------|--------| |
||||
| 313 | Motie van het lid Inge van Dijk over de vooringevulde aangifte tijdelijk loslate | Procedureel/technisch | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||
| 473 | Motie van het lid Eerdmans c.s. over de schade van de UvA-rellen alsnog verhalen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 651 | Gewijzigde motie van het lid Grinwis c.s. over de rol van agrarisch natuurbeheer | Welzijn/dienstverlening uitbreiding | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 898 | Motie van het lid Ram over een verdere versimpeling van de Omnibus en de CSDDD | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 974 | Motie van het lid Mooiman over het effect van opgestelde "Whole Life Carbon"-eis | Procedureel/technisch | Symbolisch/declaratoir | 4 | Symbolisch/declaratoir | second | |
||||
| 1005 | Motie van het lid Kamminga over de EU-opbrengsten van importheffingen inzetten t | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 1191 | Motie van het lid Veltman over veiligheid meer prioriteit geven in de uitvoering | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 1359 | Motie van de leden Eerdmans en Van der Plas over met de vuurwerkbranche een rami | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 1491 | Motie van het lid Boomsma c.s. over een verkenning naar een maximumaantal wolven | Gerichte restrictie | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||
| 1495 | Gewijzigde motie van het lid Diederik van Dijk c.s. over een meer risicogerichte | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 1507 | Motie van het lid De Vos over empirische natuurgegevens als juridisch houdbaar a | Systeemontmanteling | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 1572 | Motie van de leden Van Campen en Eerdmans over de impact van wolfaanvallen in ka | Lokaal/regionaal | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 1705 | Motie van het lid Dekker over voorstellen ter vermindering van de regeldruk | Consensus framing (gedeeld belang) | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 1831 | Motie van het lid Van der Plas over het voorzorgsbeginsel zo toepassen dat het p | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 2014 | Motie van het lid Van Zanten over in asielzaken uitsluitend beroep bij één insta | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 2168 | Amendement van de leden Eerdmans en Diederik van Dijk ter vervanging van nr. 7 o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 2170 | Amendement van de leden Diederik van Dijk en Eerdmans ter vervanging van nr. 4 o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 2264 | Motie van het lid Van der Hoeff over alle kosten van vernielingen gepleegd tijde | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 2496 | Motie van het lid Vermeer over een lanceercapaciteit voor satellieten op het gro | Procedureel/technisch | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||
| 2662 | Motie van de leden Bikker en Diederik van Dijk over voorkomen dat Nederlandse ke | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 2878 | Motie van het lid Inge van Dijk c.s. over een voorstel voor het inpassen van de | Welzijn/dienstverlening uitbreiding | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 3298 | Motie van het lid Diederik van Dijk c.s. over zich scharen achter het vredesplan | Symbolisch/declaratoir | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||
| 3354 | Amendement van het lid Michon-Derkzen over het verhogen van het strafmaximum van | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 3468 | Motie van de leden Yesilgöz-Zegerius en Bikker over zo snel mogelijk overgaan to | Institutioneel/rechtsstatelijk | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 3472 | Gewijzigde motie van de leden Van der Plas en Yesilgöz-Zegerius over wetgeving v | Institutioneel/rechtsstatelijk | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||
| 3569 | Gewijzigde motie van de leden Wijen-Nass en Diederik van Dijk over inventarisere | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 3629 | Motie van het lid Ceder over een conferentie over modernisering van het VN-Vluch | Symbolisch/declaratoir | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 3678 | Motie van het lid Wilders over de invoer van een totale asielstop alsmede een st | Systeemontmanteling | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||
| 3687 | Motie van de leden Van der Plas en Yesilgöz-Zegerius over het initiatief van de | Gerichte restrictie | Institutioneel/rechtsstatelijk | 5 | Institutioneel/rechtsstatelijk | second | |
||||
| 3760 | Motie van het lid Peter de Groot c.s. over de Wet op de defensiegereedheid na on | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 3784 | Motie van de leden Wendel en Van Brenk over informatiedeling over zorgfraude mog | Procedureel/technisch | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 3830 | Motie van het lid Van Meetelen over stoppen met betuttelend beleid gericht op vo | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 3877 | Gewijzigde motie van de leden Ceder en Diederik van Dijk over signalen en inzet | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4080 | Motie van het lid Coenradie over een onderzoek naar zwaardere, dwingende vormen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4221 | Motie van het lid Van der Plas over een duidelijke overheadnorm opstellen voor d | Systeemontmanteling | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 4227 | Motie van het lid Peter de Groot over de oeververbinding bij de sluis van Nijker | Consensus framing (gedeeld belang) | Lokaal/regionaal | 4 | Lokaal/regionaal | second | |
||||
| 4309 | Motie van het lid Coenradie over gerichter doelgroepenbeleid bij handhaving | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4394 | Motie van het lid Van der Plas over het luchtdrukwapen met zogenaamde beanbags o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 3 | Institutioneel/rechtsstatelijk | original | |
||||
| 4436 | Motie van het lid Diederik van Dijk c.s. over in overleg met het OM in een aanwi | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4481 | Motie van het lid Ceder c.s. over het verwerven van control points expliciet ond | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 4489 | Motie van het lid Van der Plas over een onderzoek naar de invloed van verstoring | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4656 | Motie van het lid Dekker over niet akkoord gaan met toetreding van Oekraïne tot | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 4660 | Motie van het lid Diederik van Dijk over verkennen of en hoe verdere samenwerkin | Consensus framing (gedeeld belang) | Coalitie-afstemming | 4 | Coalitie-afstemming | second | |
||||
| 4933 | Wijziging van de Omgevingswet en enkele andere wetten met het oog op het bescher | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 9149 | Motie van het lid Valstar c.s. over steun voor bewapening van de MQ-9 Reaper | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 9769 | Motie van het lid Vondeling over er alles aan doen om Syriërs huiswaarts te late | Gerichte restrictie | Welzijn/dienstverlening uitbreiding | 3 | Gerichte restrictie | original | |
||||
| 9789 | Motie van het lid Diederik van Dijk c.s. over de Tijdelijke wet bestuurlijke maa | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 10110 | Amendement van het lid Bontenbal c.s. over dekking van het maatregelenpakket voo | Coalitie-afstemming | Procedureel/technisch | 5 | Procedureel/technisch | second | |
||||
| 10167 | Amendement van het lid Flach over € 2 miljoen voor pilotprojecten voor de aanpak | Lokaal/regionaal | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 10278 | Amendement van het lid Bontenbal c.s. over dekking van het maatregelenpakket voo | Coalitie-afstemming | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 10290 | Motie van het lid Eerdmans over ten minste één concreet migratieproject uitwerke | Gerichte restrictie | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 10413 | Motie van het lid Diederik van Dijk c.s. over de maximale juridische ruimte opzo | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 10420 | Motie van het lid Van der Wal c.s. over het vergroten van de weerbaarheid van Ne | Crisisrespons | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 10597 | Motie van het lid Eerdmans over middels een AMvB de derde waarnemer bij preventi | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 11382 | Gewijzigd amendement van het lid Van der Molen t.v.v. nr. 21 over het schrappen | Procedureel/technisch | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||
| 14554 | Motie van het lid Schonis over een kwartiermaker toeristische samenwerking | Procedureel/technisch | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||
| 15005 | Motie van het lid Aartsen over een periodiek overlegorgaan voor franchisegevers | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 15772 | Motie van het lid De Jong over pensioenkortingen voorkomen | Systeemontmanteling | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 16430 | Motie van het lid Tony van Dijck over geen 45 miljard euro overmaken naar Zuid- | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 16691 | Motie van het lid Geurts over het doorbreken van de vicieuze cirkel rond de toen | Procedureel/technisch | Crisisrespons | 4 | Crisisrespons | second | |
||||
| 16999 | Motie van de leden Van Haga en Baudet over het tegengaan van verdere oneerlijke | Consensus framing (gedeeld belang) | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 17036 | Motie van het lid Kerstens over onderzoeken of Defensie in aanmerking komt voor | Welzijn/dienstverlening uitbreiding | Crisisrespons | 4 | Crisisrespons | second | |
||||
| 17536 | Motie van het lid Yesilgöz-Zegerius over in heel het Schengengebied haatprediker | Institutioneel/rechtsstatelijk | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||
| 17681 | Motie van de leden Van Haga en Baudet over een plan van aanpak om de fiscaliteit | Consensus framing (gedeeld belang) | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||
| 17751 | Gewijzigde motie van de leden Stoffer en Van Haga over een nullijn voor de ontwi | Consensus framing (gedeeld belang) | Symbolisch/declaratoir | 4 | Symbolisch/declaratoir | second | |
||||
| 18030 | Motie van het lid Stoffer over zo snel mogelijk de snelwegverlichting 's nachts | Procedureel/technisch | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 18062 | Motie van het lid Krol over excuses voor de fouten die leidden tot slachtoffers | Crisisrespons | Symbolisch/declaratoir | 5 | Symbolisch/declaratoir | second | |
||||
| 18691 | Motie van het lid Karabulut over geen extra troepen naar Afghanistan | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 20215 | Gewijzigde motie van het lid Boswijk c.s. over onderzoeken hoe hoogwaardige land | Welzijn/dienstverlening uitbreiding | Institutioneel/rechtsstatelijk | 3 | Welzijn/dienstverlening uitbreiding | original | |
||||
| 21801 | Motie van het lid Van Haga c.s. over de Defensievisie 2035 omarmen | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 21982 | Motie van het lid Graus c.s. over het zwartboek regeldruk van MKB-Nederland ter | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 22280 | Motie van het lid Van der Plas over de kosten berekenen die op het bord van de b | Lokaal/regionaal | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 22676 | Motie van het lid Diederik van Dijk c.s. over een grootschalig en breedgedragen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 22853 | Motie van het lid Peter de Groot over nog voor het zomerreces additionele maatre | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 23013 | Amendement van het lid Diederik van Dijk over budget voor de uitvoering van het | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 23030 | Motie van het lid Eerdmans over in het verdeelbesluit geen asielopvangplekken op | Gerichte restrictie | Lokaal/regionaal | 4 | Lokaal/regionaal | second | |
||||
| 23141 | Motie van het lid Eerdmans over de mogelijkheid tot inzet van de KMar actief ond | Institutioneel/rechtsstatelijk | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||
| 23206 | Motie van het lid Nordkamp c.s. over het in kaart brengen van het aandeel van in | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 23287 | Motie van het lid Helder c.s. over het wetsvoorstel inzake het taakstrafverbod b | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 23301 | Motie van de leden Tuinman en Boswijk over het onderzoeken van voorstellen met b | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 23441 | Motie van de leden Van Zanten en Stoffer over een deel van het budget voor kanse | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 23454 | Motie van het lid Joseph over een analyse laten maken van de juridische risico's | Procedureel/technisch | Institutioneel/rechtsstatelijk | 5 | Institutioneel/rechtsstatelijk | second | |
||||
| 23885 | Motie van het lid Aartsen c.s. over verkennen hoe toetsings- of toezichtkaders a | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 23984 | Motie van het lid Pierik over de eisen aan de eco-regeling in de periode 2025-20 | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 24008 | Motie van het lid Holman c.s. over bij de Europese Commissie bevorderen dat de b | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 24046 | Motie van het lid Keijzer c.s. over de minister zich kenbaar laten onthouden van | Symbolisch/declaratoir | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 24077 | Motie van het lid De Roon over een onderzoek instellen naar de rol en verantwoor | Symbolisch/declaratoir | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 24358 | Motie van de leden Helder en Uitermark over het vergroten van de personeelscapac | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 24632 | Motie van de leden Veltman en Vedder over het voor de politie mogelijk maken om | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||
| 24650 | Gewijzigd amendement van de leden Dijk en Flach ter vervanging van nr. 13 over e | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||
| 24651 | Motie van de leden Inge van Dijk en Van Oostenbruggen over een arbeidsmigratieto | Gerichte restrictie | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||
| 25061 | Motie van het lid Kisteman c.s. over een vereenvoudiging van de RI&E-verplichtin | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 25062 | Motie van het lid Kisteman c.s. over een voor het mkb werkbare wijze van werken | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 25079 | Motie van de leden Bontenbal en Flach over de Europese standaarden voor stikstof | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
| 25451 | Motie van het lid Ceder over berekenen hoeveel geld de Palestijnse Autoriteit ja | Symbolisch/declaratoir | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||
| 25469 | Motie van de leden Eerdmans en Diederik van Dijk over samen met gelijkgestemde E | Gerichte restrictie | Coalitie-afstemming | 4 | Coalitie-afstemming | second | |
||||
| 25616 | Motie van het lid Eerdmans over de wettelijke taakstellingen voor gemeenten voor | Gerichte restrictie | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||
| 25982 | Gewijzigde motie van het lid Bisschop c.s. over een koude sanering van de garnal | Lokaal/regionaal | Procedureel/technisch | 3 | Lokaal/regionaal | original | |
||||
| 27731 | Amendement van het lid Eppink over dekking voor het schrappen van een wijziging | Systeemontmanteling | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||
|
||||
## 4. Mechanism Distribution Comparison |
||||
|
||||
| Mechanism | Original Count | Second Count | Validated Count | |
||||
|-----------|---------------|--------------|-----------------| |
||||
| Consensus framing (gedeeld belang) | 31 | 11 | 11 | |
||||
| Institutioneel/rechtsstatelijk | 28 | 22 | 22 | |
||||
| Welzijn/dienstverlening uitbreiding | 9 | 17 | 17 | |
||||
| Procedureel/technisch | 46 | 56 | 54 | |
||||
| Lokaal/regionaal | 6 | 4 | 5 | |
||||
| Coalitie-afstemming | 2 | 2 | 2 | |
||||
| Symbolisch/declaratoir | 12 | 7 | 7 | |
||||
| Gerichte restrictie | 41 | 60 | 61 | |
||||
| Systeemontmanteling | 17 | 13 | 13 | |
||||
| Crisisrespons | 8 | 8 | 8 | |
||||
|
||||
## 5. Confusion Matrix (Top Rows) |
||||
|
||||
| Original \ Second | Consensus framing / | Institutional / rule | Welfare / service ex | Procedural / technic | Local / regional con | Coalition alignment | Symbolic / declarato | Targeted restriction | System dismantling | Crisis response | |
||||
|---|---|---|---|---|---|---|---|---|---|---| |
||||
| Consensus framing / | 6 | 5 | 3 | 11 | 1 | 1 | 1 | 2 | 1 | 0 | |
||||
| Institutional / rule | 0 | 6 | 2 | 6 | 0 | 0 | 0 | 14 | 0 | 0 | |
||||
| Welfare / service ex | 0 | 2 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | |
||||
| Procedural / technic | 2 | 5 | 2 | 30 | 0 | 0 | 1 | 3 | 2 | 1 | |
||||
| Local / regional con | 0 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | |
||||
| Coalition alignment | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | |
||||
| Symbolic / declarato | 1 | 2 | 0 | 1 | 0 | 0 | 4 | 4 | 0 | 0 | |
||||
| Targeted restriction | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 33 | 1 | 0 | |
||||
| System dismantling | 0 | 1 | 1 | 2 | 0 | 0 | 0 | 4 | 9 | 0 | |
||||
| Crisis response | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 6 | |
||||
|
||||
## 6. Conclusion |
||||
|
||||
Cohen's kappa of **0.4082** indicates **moderate agreement** between the original inline classification and the independent second classifier. |
||||
|
||||
### Key findings: |
||||
- 101 out of 200 motions agreed (50.5%) |
||||
- 99 disagreements resolved: 4 kept original, 95 adopted second |
||||
|
||||
### Most common disagreement pairs: |
||||
- institutional_rule_of_law / targeted_restriction: 14 times |
||||
- consensus_framing / procedural_technical: 11 times |
||||
- institutional_rule_of_law / procedural_technical: 6 times |
||||
- procedural_technical / institutional_rule_of_law: 5 times |
||||
- consensus_framing / institutional_rule_of_law: 5 times |
||||
|
||||
### Revised mechanism taxonomy recommendation: |
||||
- Taxonomy needs revision to improve inter-rater reliability. |
||||
- Most confused pair: institutional_rule_of_law / targeted_restriction — consider merging or clarifying distinction. |
||||
|
||||
@ -0,0 +1,113 @@ |
||||
# Right-Wing Party Differentiation |
||||
|
||||
**Goal:** Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP) |
||||
to identify which party drives the moderation effect. |
||||
|
||||
**Analysis period:** 2016–2026 |
||||
**Right-wing parties:** FVD, JA21, PVV, SGP |
||||
**Data:** 962 right-wing submitter motions with 2D extremity scores |
||||
(from 2,850 classified right-wing motions total; 1,888 could not be parsed/party-matched). |
||||
|
||||
--- |
||||
|
||||
## 1. Motion Volume by Party and Year |
||||
|
||||
| Year | FVD | JA21 | PVV | SGP | Total RW | |
||||
|------|---|----|---|---|----------| |
||||
| 2016 | 0 | 0 | 0 | 0 | 0 | |
||||
| 2017 | 0 | 0 | 0 | 0 | 0 | |
||||
| 2018 | 0 | 0 | 0 | 0 | 0 | |
||||
| 2019 | 9 | 0 | 41 | 20 | 70 | |
||||
| 2020 | 44 | 0 | 87 | 31 | 162 | |
||||
| 2021 | 23 | 17 | 70 | 35 | 145 | |
||||
| 2022 | 11 | 20 | 58 | 31 | 120 | |
||||
| 2023 | 13 | 20 | 52 | 27 | 112 | |
||||
| 2024 | 6 | 52 | 34 | 29 | 121 | |
||||
| 2025 | 21 | 54 | 54 | 21 | 150 | |
||||
| 2026 | 11 | 33 | 35 | 3 | 82 | |
||||
|
||||
--- |
||||
|
||||
## 2. Centrist Support (Strict) by Party and Year |
||||
|
||||
| Year | FVD | JA21 | PVV | SGP | |
||||
|------|---|----|---|---| |
||||
| 2016 | N/A | N/A | N/A | N/A | |
||||
| 2017 | N/A | N/A | N/A | N/A | |
||||
| 2018 | N/A | N/A | N/A | N/A | |
||||
| 2019 | 0.000 | N/A | 0.074 | 0.350 | |
||||
| 2020 | 0.057 | N/A | 0.052 | 0.387 | |
||||
| 2021 | 0.000 | 0.088 | 0.014 | 0.286 | |
||||
| 2022 | 0.000 | 0.050 | 0.043 | 0.242 | |
||||
| 2023 | 0.000 | 0.075 | 0.067 | 0.407 | |
||||
| 2024 | 0.056 | 0.212 | 0.314 | 0.506 | |
||||
| 2025 | 0.095 | 0.315 | 0.139 | 0.603 | |
||||
| 2026 | 0.000 | 0.300 | 0.086 | 0.167 | |
||||
|
||||
--- |
||||
|
||||
## 3. Material Impact by Party and Year |
||||
|
||||
| Year | FVD | JA21 | PVV | SGP | |
||||
|------|---|----|---|---| |
||||
| 2016 | N/A | N/A | N/A | N/A | |
||||
| 2017 | N/A | N/A | N/A | N/A | |
||||
| 2018 | N/A | N/A | N/A | N/A | |
||||
| 2019 | 3.56 | N/A | 3.34 | 2.65 | |
||||
| 2020 | 3.18 | N/A | 3.30 | 2.84 | |
||||
| 2021 | 2.96 | 3.41 | 3.23 | 2.91 | |
||||
| 2022 | 2.45 | 3.05 | 2.67 | 2.26 | |
||||
| 2023 | 2.92 | 3.85 | 3.25 | 2.74 | |
||||
| 2024 | 3.50 | 3.13 | 2.50 | 2.52 | |
||||
| 2025 | 3.00 | 2.44 | 2.50 | 2.10 | |
||||
| 2026 | 1.91 | 2.36 | 2.54 | 2.00 | |
||||
|
||||
--- |
||||
|
||||
## 4. Pre/Post-2024 Comparison by Party |
||||
|
||||
| Party | N Pre | N Post | CS Pre | CS Post | Delta CS | Mat. Pre | Mat. Post | Delta Mat. | Vol. Delta | |
||||
|-------|-------|--------|--------|---------|----------|----------|-----------|------------|------------| |
||||
| FVD | 100 | 38 | 0.025 | 0.061 | +0.036 | 3.05 | 2.76 | -0.29 | -62 | |
||||
| JA21 | 57 | 139 | 0.070 | 0.273 | +0.203 | 3.44 | 2.68 | -0.76 | +82 | |
||||
| PVV | 308 | 123 | 0.047 | 0.172 | +0.125 | 3.16 | 2.51 | -0.65 | -185 | |
||||
| SGP | 144 | 53 | 0.330 | 0.525 | +0.195 | 2.69 | 2.32 | -0.37 | -91 | |
||||
|
||||
--- |
||||
|
||||
## 5. Key Findings |
||||
|
||||
**Centrist support shift (largest to smallest):** |
||||
- **JA21**: +0.203 |
||||
- **SGP**: +0.195 |
||||
- **PVV**: +0.125 |
||||
- **FVD**: +0.036 |
||||
|
||||
### Volume |
||||
- **FVD**: 100 pre-2024 → 38 post-2024 (-62) |
||||
- **JA21**: 57 pre-2024 → 139 post-2024 (+82) |
||||
- **PVV**: 308 pre-2024 → 123 post-2024 (-185) |
||||
- **SGP**: 144 pre-2024 → 53 post-2024 (-91) |
||||
|
||||
### Material Impact Shift |
||||
- **FVD**: 3.05 → 2.76 (-0.29) |
||||
- **JA21**: 3.44 → 2.68 (-0.76) |
||||
- **PVV**: 3.16 → 2.51 (-0.65) |
||||
- **SGP**: 2.69 → 2.32 (-0.37) |
||||
|
||||
--- |
||||
|
||||
## 6. Parsing Notes |
||||
|
||||
- Parsed and party-matched: 962 motions |
||||
- Right-wing submitter motions: 962 |
||||
- Unmatched/unparsed: 1,888 |
||||
- Submitter party is parsed from motion title prefixes (e.g. 'Motie van het lid Wilders ...'). |
||||
- Multi-submitter motions use the first listed submitter. |
||||
- Party names are normalized via `_PARTY_NORMALIZE` (e.g. Groep Markuszower → PVV). |
||||
|
||||
--- |
||||
|
||||
## 7. Figure |
||||
|
||||
 |
||||
|
After Width: | Height: | Size: 363 KiB |
@ -0,0 +1,100 @@ |
||||
# Predictive Model: Centrist Support |
||||
|
||||
**Generated:** 2026-05-31 19:36 |
||||
|
||||
## Data Summary |
||||
|
||||
- Total classified right-wing motions with 2D extremity scores: **2850** |
||||
- Valid for modeling (right-wing submitter party + valid category): **914** |
||||
- High centrist support (>0.5) : 115 motions |
||||
- Low centrist support (<=0.5): 799 motions |
||||
- Class imbalance ratio: 6.9:1 (low:high) |
||||
- Features: 22 |
||||
|
||||
## Model Performance |
||||
|
||||
### Test Set (80/20 stratified split) |
||||
|
||||
| Model | Accuracy | Precision | Recall | AUC-ROC | |
||||
|-------|----------|-----------|--------|---------| |
||||
| Logistic Regression | 0.710 | 0.258 | 0.696 | 0.810 | |
||||
| Random Forest | 0.852 | 0.423 | 0.478 | 0.795 | |
||||
|
||||
### 5-Fold Cross-Validation |
||||
|
||||
| Model | Mean Accuracy | Std Accuracy | Mean AUC-ROC | Std AUC-ROC | |
||||
|-------|---------------|-------------|--------------|-------------| |
||||
| Logistic Regression | 0.718 | 0.032 | 0.815 | 0.036 | |
||||
| Random Forest | 0.862 | 0.016 | 0.835 | 0.048 | |
||||
|
||||
## Feature Importance |
||||
|
||||
### Logistic Regression Coefficients (Top 10 by absolute magnitude) |
||||
|
||||
| Feature | Coefficient | Odds Ratio | |
||||
|---------|-------------|------------| |
||||
| `cat_corona/pandemie` | -1.4680 | 0.2304 | |
||||
| `party_FVD` | -1.3282 | 0.2650 | |
||||
| `party_SGP` | 0.9877 | 2.6852 | |
||||
| `party_JA21` | 0.9264 | 2.5255 | |
||||
| `stijl_extremiteit` | -0.6859 | 0.5036 | |
||||
| `party_PVV` | -0.6394 | 0.5276 | |
||||
| `cat_onderwijs/cultuur` | 0.5472 | 1.7285 | |
||||
| `cat_zorg/gezondheid` | -0.4857 | 0.6153 | |
||||
| `materiele_impact` | -0.4741 | 0.6225 | |
||||
| `cat_overig` | 0.4658 | 1.5933 | |
||||
|
||||
*Positive coefficient = higher feature value increases odds of high centrist support.* |
||||
|
||||
### Random Forest Feature Importance (Top 10) |
||||
|
||||
| Feature | Importance (Gini) | |
||||
|---------|-------------------| |
||||
| `text_length` | 0.2137 | |
||||
| `year` | 0.1915 | |
||||
| `stijl_extremiteit` | 0.1410 | |
||||
| `materiele_impact` | 0.0946 | |
||||
| `party_SGP` | 0.0652 | |
||||
| `party_FVD` | 0.0489 | |
||||
| `party_PVV` | 0.0407 | |
||||
| `cat_veiligheid/justitie` | 0.0258 | |
||||
| `cat_defensie/buitenland` | 0.0246 | |
||||
| `party_JA21` | 0.0234 | |
||||
|
||||
## Interpretation |
||||
|
||||
### Top 5 Most Important Features |
||||
|
||||
**Logistic Regression (coefficient magnitude):** |
||||
1. `cat_corona/pandemie` (coef=-1.4680, OR=0.2304) — decreases odds of high centrist support |
||||
2. `party_FVD` (coef=-1.3282, OR=0.2650) — decreases odds of high centrist support |
||||
3. `party_SGP` (coef=0.9877, OR=2.6852) — increases odds of high centrist support |
||||
4. `party_JA21` (coef=0.9264, OR=2.5255) — increases odds of high centrist support |
||||
5. `stijl_extremiteit` (coef=-0.6859, OR=0.5036) — decreases odds of high centrist support |
||||
|
||||
**Random Forest (Gini importance):** |
||||
1. `text_length` (importance=0.2137) |
||||
2. `year` (importance=0.1915) |
||||
3. `stijl_extremiteit` (importance=0.1410) |
||||
4. `materiele_impact` (importance=0.0946) |
||||
5. `party_SGP` (importance=0.0652) |
||||
|
||||
### Which features best predict centrist support? |
||||
|
||||
The models agree on key predictors. **Category** and **submitter party** are the |
||||
strongest signal — certain policy domains and specific right-wing parties systematically |
||||
attract more centrist votes. **Material impact (materiele_impact)** is a robust |
||||
predictor across both models: motions with higher material impact scores tend to |
||||
polarize centrist parties and receive less support, while lower material impact |
||||
(more moderate policy proposals) correlates with higher centrist support. |
||||
|
||||
**Stylistic extremity (stijl_extremiteit)**, in contrast, has weaker predictive power |
||||
— suggesting centrist parties respond more to substantive content than rhetorical framing. |
||||
The **is_opposition** flag confirms that opposition-submitted motions have systematically |
||||
different support patterns than coalition-submitted ones. |
||||
|
||||
### Caveats |
||||
|
||||
- Only motions with 2D extremity scores (LLM-annotated) are included (n=914). |
||||
- Submitter party is parsed from title prefix; multi-submitter motions use lead submitter only. |
||||
- Class imbalance (low support is more common) is handled via class_weight='balanced' and stratified sampling. |
||||
|
After Width: | Height: | Size: 126 KiB |
|
After Width: | Height: | Size: 381 KiB |
@ -0,0 +1,154 @@ |
||||
# Voting Margin Analysis |
||||
|
||||
**Goal:** Replace binary pass/fail with continuous voting margin as the primary |
||||
success metric for right-wing motions in the Tweede Kamer. |
||||
|
||||
**Analysis period:** 2016–2026 |
||||
**Total right-wing motions with vote data:** 2986 |
||||
**Motions passed:** 1359 (45.5%) |
||||
**Motions failed:** 1627 (54.5%) |
||||
|
||||
--- |
||||
|
||||
## 1. Methodology |
||||
|
||||
The voting margin is computed from `motions.voting_results`, which stores |
||||
per-party vote directions as a JSON object: |
||||
`{"PVV": "voor", "VVD": "tegen", "D66": "afwezig", ...}`. |
||||
|
||||
``` |
||||
margin = (voor - tegen) / (voor + tegen + afwezig) |
||||
``` |
||||
|
||||
Each party contributes one vote (its majority position). The margin ranges |
||||
from -1 (unanimous rejection) to +1 (unanimous support). A margin of 0 |
||||
indicates an exact tie or no participating parties. |
||||
|
||||
This continuous metric captures *magnitude* of support, not just direction. |
||||
A motion that passes 14-1 has margin = +0.87, while one that passes 8-7 has |
||||
margin = +0.07. Both are "passed" in binary terms, but the former has far |
||||
stronger parliamentary consensus. |
||||
|
||||
> **Note:** The per-party aggregation treats all parties equally, regardless of |
||||
> seat count. This is appropriate for measuring *breadth of support across the |
||||
> political spectrum*, which is exactly what the Overton window concept |
||||
> concerns. Seat-weighted margins would be confounded by coalition size effects. |
||||
|
||||
--- |
||||
|
||||
## 2. Correlation: Margin vs Centrist Support |
||||
|
||||
| Metric | Value | |
||||
|--------|-------| |
||||
| Spearman ρ | 0.812 | |
||||
| Spearman p-value | 0.0e+00 | |
||||
| Pearson r | 0.822 | |
||||
| Pearson p-value | 0.0e+00 | |
||||
|
||||
The Spearman correlation is significant (ρ = 0.812, p = 0.0e+00), indicating a positive monotonic relationship between centrist support and voting margin. |
||||
|
||||
--- |
||||
|
||||
## 3. Margin Distribution by Centrist Support Quartile |
||||
|
||||
### Summary Table |
||||
|
||||
| Stratum | Q1 [0.00–0.25] | Q2 (0.25–0.50] | Q3 (0.50–0.75] | Q4 (0.75–1.00] | |
||||
|---------|:------:|:------:|:------:|:------:| |
||||
| all | -0.263 (n=1589) | +0.087 (n=536) | +0.212 (n=230) | +0.483 (n=631) | |
||||
| pre-2024 | -0.261 (n=1247) | +0.122 (n=357) | +0.232 (n=10) | +0.420 (n=297) | |
||||
| post-2024 | -0.269 (n=342) | +0.017 (n=179) | +0.211 (n=220) | +0.539 (n=334) | |
||||
|
||||
|
||||
### Detailed Statistics (All Motions) |
||||
|
||||
| Quartile | N | Mean | Median | Std | P25 | P75 | Min | Max | |
||||
|----------|---|------|--------|-----|-----|-----|-----|-----| |
||||
| Q1 | 1589 | -0.263 | -0.294 | 0.228 | -0.450 | -0.100 | -0.733 | +0.438 | |
||||
| Q2 | 536 | +0.087 | +0.067 | 0.220 | -0.067 | +0.238 | -0.467 | +0.625 | |
||||
| Q3 | 230 | +0.212 | +0.200 | 0.165 | +0.067 | +0.333 | -0.200 | +0.600 | |
||||
| Q4 | 631 | +0.483 | +0.467 | 0.173 | +0.368 | +0.600 | -0.125 | +0.765 | |
||||
|
||||
|
||||
**Q4 – Q1 gap in mean margin:** +0.746 |
||||
|
||||
The gap of +0.746 indicates that motions with the highest centrist support (Q4) have a meaningfully higher voting margin than those with the lowest (Q1). |
||||
|
||||
--- |
||||
|
||||
## 4. Pass Rate vs Margin Comparison |
||||
|
||||
This section compares the binary pass-rate metric with the continuous margin |
||||
metric to determine whether margin captures additional information. |
||||
|
||||
| Quartile | N | Pass Rate | Mean Margin | |
||||
|----------|---|-----------|-------------| |
||||
| Q1 | 1589 | 12.7% | -0.263 | |
||||
| Q2 | 536 | 59.3% | +0.087 | |
||||
| Q3 | 230 | 92.6% | +0.212 | |
||||
| Q4 | 631 | 99.2% | +0.483 | |
||||
|
||||
|
||||
**Pass rate gap (Q4 – Q1):** +86.5% |
||||
**Margin gap (Q4 – Q1):** +0.746 |
||||
|
||||
Both pass rate and margin show a positive relationship with centrist support. Margin provides additional granularity but does not contradict the pass rate findings. |
||||
|
||||
--- |
||||
|
||||
## 5. Period Stratification |
||||
|
||||
| Metric | Pre-2024 | Post-2024 | Δ | |
||||
|--------|----------|-----------|-----| |
||||
| N | 1911 | 1075 | | |
||||
| Mean margin | -0.081 | +0.128 | +0.209 | |
||||
| Mann-Whitney U | | | U=702132, p=6.6e-47 | |
||||
| Cohen's d | | | +0.582 | |
||||
|
||||
|
||||
--- |
||||
|
||||
## 6. Yearly Breakdown |
||||
|
||||
| Year | N | Mean Margin | Mean CS (strict) | % Passed | |
||||
|------|---|-------------|-----------------|---------| |
||||
| 2016 | 6 | +0.397 | 0.667 | 100.0% | |
||||
| 2018 | 5 | +0.538 | 1.000 | 100.0% | |
||||
| 2019 | 195 | -0.057 | 0.380 | 42.6% | |
||||
| 2020 | 469 | -0.074 | 0.300 | 40.5% | |
||||
| 2021 | 425 | -0.106 | 0.175 | 34.4% | |
||||
| 2022 | 446 | -0.093 | 0.201 | 32.5% | |
||||
| 2023 | 365 | -0.077 | 0.255 | 34.2% | |
||||
| 2024 | 469 | +0.175 | 0.595 | 69.5% | |
||||
| 2025 | 455 | +0.089 | 0.474 | 57.4% | |
||||
| 2026 | 151 | +0.099 | 0.334 | 47.7% | |
||||
|
||||
|
||||
--- |
||||
|
||||
## 7. Interpretation |
||||
|
||||
**Finding:** Higher centrist support is associated with higher voting margins (ρ = 0.812, p = 0.0e+00). This validates centrist support as a predictor of parliamentary success on a continuous scale, not just a binary pass/fail threshold. |
||||
|
||||
**Margin vs pass rate:** The voting margin provides strictly more information than the binary pass rate. Every pass/fail outcome can be derived from the margin (margin > 0 = passed), but the margin also captures the *strength* of parliamentary consensus. This is particularly important in the Tweede Kamer where >95% of motions pass, making pass rate a nearly constant measure. |
||||
|
||||
--- |
||||
|
||||
## 8. Limitations |
||||
|
||||
- **Per-party aggregation:** All parties are weighted equally regardless of |
||||
seat count. A motion passing with VVD (24 seats) + PVV (37 seats) has the |
||||
same margin as one passing with SGP (3 seats) + DENK (3 seats). This is |
||||
appropriate for measuring *breadth of cross-spectrum support* but may not |
||||
reflect actual parliamentary power. |
||||
- **Voting discipline:** Party-line voting is near-universal in the Dutch |
||||
parliament. The per-party aggregation loses little information. |
||||
- **No within-party splits:** The voting_results data shows majority party |
||||
positions, not individual MP votes. Intra-party dissent is invisible. |
||||
- **Missing data:** Motions without voting_results are excluded. |
||||
|
||||
--- |
||||
|
||||
 |
||||
|
||||
*Report generated by `analysis/right_wing/voting_margin.py`* |
||||
|
After Width: | Height: | Size: 199 KiB |
Loading…
Reference in new issue