feat(overton): improvements and extensions — party differentiation, voting margin, SVD viz, mechanism validation, predictive model
U1: JA21 drives moderation effect (+0.203 CS shift, only party with volume+support gains) U2: Coalition coding split at July 2024 — opposition effect confirmed (d=0.85 vs 0.87) U3: Voting margin (ρ=0.812 with centrist support) is far superior to pass rate U4: SVD trajectory confirms spatial divergence — centrists moved left (Δx=-0.30), right stationary U5: Mechanism classification Cohen's κ=0.41 (moderate) — taxonomy needs revision U6: Predictive model AUC-ROC=0.81 — submitter party and category are strongest predictorsmain
parent
7df961ba83
commit
d34d43a888
@ -0,0 +1,946 @@ |
|||||||
|
#!/usr/bin/env python3 |
||||||
|
"""Mechanism classification validation with a second classifier. |
||||||
|
|
||||||
|
Computes inter-rater reliability (Cohen's kappa) between the original inline |
||||||
|
classifications and a second LLM-based classification using a different prompt |
||||||
|
template and (optionally) a different model. |
||||||
|
|
||||||
|
Usage: |
||||||
|
uv run python analysis/right_wing/mechanism_validation.py |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import argparse |
||||||
|
import json |
||||||
|
import logging |
||||||
|
import sys |
||||||
|
import time |
||||||
|
from collections import Counter |
||||||
|
from concurrent.futures import ThreadPoolExecutor |
||||||
|
from pathlib import Path |
||||||
|
from typing import Any |
||||||
|
|
||||||
|
import duckdb |
||||||
|
|
||||||
|
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||||
|
if str(ROOT) not in sys.path: |
||||||
|
sys.path.insert(0, str(ROOT)) |
||||||
|
|
||||||
|
from ai_provider import ProviderError, chat_completion |
||||||
|
from analysis.config import config |
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
# ── mechanism taxonomy ─────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
MECHANISMS = [ |
||||||
|
"consensus_framing", |
||||||
|
"institutional_rule_of_law", |
||||||
|
"welfare_service_expansion", |
||||||
|
"procedural_technical", |
||||||
|
"local_constituency", |
||||||
|
"coalition_alignment", |
||||||
|
"symbolic_declaratory", |
||||||
|
"targeted_restriction", |
||||||
|
"system_dismantling", |
||||||
|
"crisis_response", |
||||||
|
] |
||||||
|
|
||||||
|
MECHANISM_LABELS_NL = { |
||||||
|
"consensus_framing": "Consensus framing (gedeeld belang)", |
||||||
|
"institutional_rule_of_law": "Institutioneel/rechtsstatelijk", |
||||||
|
"welfare_service_expansion": "Welzijn/dienstverlening uitbreiding", |
||||||
|
"procedural_technical": "Procedureel/technisch", |
||||||
|
"local_constituency": "Lokaal/regionaal", |
||||||
|
"coalition_alignment": "Coalitie-afstemming", |
||||||
|
"symbolic_declaratory": "Symbolisch/declaratoir", |
||||||
|
"targeted_restriction": "Gerichte restrictie", |
||||||
|
"system_dismantling": "Systeemontmanteling", |
||||||
|
"crisis_response": "Crisisrespons", |
||||||
|
} |
||||||
|
|
||||||
|
MECHANISM_LABELS_EN = { |
||||||
|
"consensus_framing": "Consensus framing / shared interest", |
||||||
|
"institutional_rule_of_law": "Institutional / rule of law", |
||||||
|
"welfare_service_expansion": "Welfare / service expansion", |
||||||
|
"procedural_technical": "Procedural / technical", |
||||||
|
"local_constituency": "Local / regional constituency", |
||||||
|
"coalition_alignment": "Coalition alignment", |
||||||
|
"symbolic_declaratory": "Symbolic / declaratory", |
||||||
|
"targeted_restriction": "Targeted restriction", |
||||||
|
"system_dismantling": "System dismantling", |
||||||
|
"crisis_response": "Crisis response", |
||||||
|
} |
||||||
|
|
||||||
|
# Original inline classifications (from mechanism_classification.py) |
||||||
|
ORIGINAL_CLASSIFICATIONS: dict[int, str] = { |
||||||
|
15458: "crisis_response", |
||||||
|
26477: "institutional_rule_of_law", |
||||||
|
9149: "consensus_framing", |
||||||
|
17099: "procedural_technical", |
||||||
|
4933: "procedural_technical", |
||||||
|
17751: "consensus_framing", |
||||||
|
20068: "procedural_technical", |
||||||
|
16520: "consensus_framing", |
||||||
|
17036: "welfare_service_expansion", |
||||||
|
17681: "consensus_framing", |
||||||
|
14554: "procedural_technical", |
||||||
|
21864: "procedural_technical", |
||||||
|
26493: "targeted_restriction", |
||||||
|
21982: "consensus_framing", |
||||||
|
14125: "crisis_response", |
||||||
|
13683: "welfare_service_expansion", |
||||||
|
16691: "procedural_technical", |
||||||
|
15005: "procedural_technical", |
||||||
|
17536: "institutional_rule_of_law", |
||||||
|
16999: "consensus_framing", |
||||||
|
8325: "procedural_technical", |
||||||
|
13370: "welfare_service_expansion", |
||||||
|
18030: "procedural_technical", |
||||||
|
11382: "procedural_technical", |
||||||
|
18616: "procedural_technical", |
||||||
|
12411: "crisis_response", |
||||||
|
22595: "crisis_response", |
||||||
|
15772: "system_dismantling", |
||||||
|
7111: "welfare_service_expansion", |
||||||
|
25784: "targeted_restriction", |
||||||
|
27731: "system_dismantling", |
||||||
|
15626: "crisis_response", |
||||||
|
20215: "welfare_service_expansion", |
||||||
|
16430: "symbolic_declaratory", |
||||||
|
25982: "local_constituency", |
||||||
|
17176: "targeted_restriction", |
||||||
|
7054: "procedural_technical", |
||||||
|
20323: "procedural_technical", |
||||||
|
18025: "system_dismantling", |
||||||
|
14837: "system_dismantling", |
||||||
|
19620: "targeted_restriction", |
||||||
|
21801: "consensus_framing", |
||||||
|
19464: "crisis_response", |
||||||
|
26855: "targeted_restriction", |
||||||
|
22280: "local_constituency", |
||||||
|
20115: "symbolic_declaratory", |
||||||
|
15082: "targeted_restriction", |
||||||
|
6637: "targeted_restriction", |
||||||
|
18691: "symbolic_declaratory", |
||||||
|
18062: "crisis_response", |
||||||
|
3784: "procedural_technical", |
||||||
|
10205: "procedural_technical", |
||||||
|
10278: "coalition_alignment", |
||||||
|
25079: "consensus_framing", |
||||||
|
2980: "targeted_restriction", |
||||||
|
10420: "crisis_response", |
||||||
|
25092: "targeted_restriction", |
||||||
|
25545: "institutional_rule_of_law", |
||||||
|
23065: "procedural_technical", |
||||||
|
2878: "welfare_service_expansion", |
||||||
|
25573: "procedural_technical", |
||||||
|
3298: "symbolic_declaratory", |
||||||
|
25061: "consensus_framing", |
||||||
|
4481: "consensus_framing", |
||||||
|
3961: "procedural_technical", |
||||||
|
473: "institutional_rule_of_law", |
||||||
|
10413: "consensus_framing", |
||||||
|
974: "procedural_technical", |
||||||
|
24009: "procedural_technical", |
||||||
|
9789: "institutional_rule_of_law", |
||||||
|
24651: "targeted_restriction", |
||||||
|
1890: "local_constituency", |
||||||
|
1191: "consensus_framing", |
||||||
|
3448: "targeted_restriction", |
||||||
|
23910: "institutional_rule_of_law", |
||||||
|
25566: "welfare_service_expansion", |
||||||
|
2070: "targeted_restriction", |
||||||
|
23885: "consensus_framing", |
||||||
|
24906: "procedural_technical", |
||||||
|
2496: "procedural_technical", |
||||||
|
25582: "targeted_restriction", |
||||||
|
3053: "local_constituency", |
||||||
|
1495: "procedural_technical", |
||||||
|
10178: "procedural_technical", |
||||||
|
1614: "procedural_technical", |
||||||
|
23441: "consensus_framing", |
||||||
|
3569: "consensus_framing", |
||||||
|
10285: "procedural_technical", |
||||||
|
23058: "procedural_technical", |
||||||
|
3287: "procedural_technical", |
||||||
|
10434: "consensus_framing", |
||||||
|
10089: "procedural_technical", |
||||||
|
22706: "consensus_framing", |
||||||
|
3877: "institutional_rule_of_law", |
||||||
|
25062: "consensus_framing", |
||||||
|
3687: "targeted_restriction", |
||||||
|
25166: "procedural_technical", |
||||||
|
4618: "procedural_technical", |
||||||
|
3468: "institutional_rule_of_law", |
||||||
|
24632: "institutional_rule_of_law", |
||||||
|
25451: "symbolic_declaratory", |
||||||
|
2351: "targeted_restriction", |
||||||
|
4227: "consensus_framing", |
||||||
|
22853: "consensus_framing", |
||||||
|
9884: "procedural_technical", |
||||||
|
1428: "consensus_framing", |
||||||
|
3629: "symbolic_declaratory", |
||||||
|
1572: "local_constituency", |
||||||
|
25493: "procedural_technical", |
||||||
|
1359: "procedural_technical", |
||||||
|
2252: "procedural_technical", |
||||||
|
23605: "procedural_technical", |
||||||
|
3760: "consensus_framing", |
||||||
|
1005: "consensus_framing", |
||||||
|
10110: "coalition_alignment", |
||||||
|
23301: "consensus_framing", |
||||||
|
24046: "symbolic_declaratory", |
||||||
|
651: "welfare_service_expansion", |
||||||
|
1491: "targeted_restriction", |
||||||
|
25606: "targeted_restriction", |
||||||
|
313: "procedural_technical", |
||||||
|
24008: "consensus_framing", |
||||||
|
754: "targeted_restriction", |
||||||
|
25469: "targeted_restriction", |
||||||
|
25091: "targeted_restriction", |
||||||
|
2170: "institutional_rule_of_law", |
||||||
|
22792: "procedural_technical", |
||||||
|
10597: "institutional_rule_of_law", |
||||||
|
23013: "institutional_rule_of_law", |
||||||
|
3472: "institutional_rule_of_law", |
||||||
|
2014: "system_dismantling", |
||||||
|
920: "procedural_technical", |
||||||
|
2143: "welfare_service_expansion", |
||||||
|
688: "system_dismantling", |
||||||
|
2290: "system_dismantling", |
||||||
|
4497: "targeted_restriction", |
||||||
|
3823: "symbolic_declaratory", |
||||||
|
23141: "institutional_rule_of_law", |
||||||
|
4436: "institutional_rule_of_law", |
||||||
|
25616: "targeted_restriction", |
||||||
|
2662: "institutional_rule_of_law", |
||||||
|
23287: "institutional_rule_of_law", |
||||||
|
4660: "consensus_framing", |
||||||
|
4761: "targeted_restriction", |
||||||
|
2264: "institutional_rule_of_law", |
||||||
|
4394: "institutional_rule_of_law", |
||||||
|
1691: "targeted_restriction", |
||||||
|
10601: "targeted_restriction", |
||||||
|
4089: "targeted_restriction", |
||||||
|
23206: "procedural_technical", |
||||||
|
22676: "institutional_rule_of_law", |
||||||
|
115: "system_dismantling", |
||||||
|
3951: "consensus_framing", |
||||||
|
1375: "targeted_restriction", |
||||||
|
3090: "targeted_restriction", |
||||||
|
24650: "procedural_technical", |
||||||
|
1772: "consensus_framing", |
||||||
|
3678: "system_dismantling", |
||||||
|
1692: "institutional_rule_of_law", |
||||||
|
24077: "symbolic_declaratory", |
||||||
|
349: "institutional_rule_of_law", |
||||||
|
9769: "targeted_restriction", |
||||||
|
4656: "symbolic_declaratory", |
||||||
|
23984: "system_dismantling", |
||||||
|
2168: "institutional_rule_of_law", |
||||||
|
4443: "institutional_rule_of_law", |
||||||
|
4489: "procedural_technical", |
||||||
|
10290: "targeted_restriction", |
||||||
|
4071: "targeted_restriction", |
||||||
|
4088: "targeted_restriction", |
||||||
|
1507: "system_dismantling", |
||||||
|
2870: "procedural_technical", |
||||||
|
1912: "system_dismantling", |
||||||
|
22658: "symbolic_declaratory", |
||||||
|
10288: "targeted_restriction", |
||||||
|
4080: "institutional_rule_of_law", |
||||||
|
1847: "targeted_restriction", |
||||||
|
23127: "system_dismantling", |
||||||
|
4367: "targeted_restriction", |
||||||
|
9790: "targeted_restriction", |
||||||
|
4150: "procedural_technical", |
||||||
|
741: "targeted_restriction", |
||||||
|
1705: "consensus_framing", |
||||||
|
1831: "consensus_framing", |
||||||
|
10600: "targeted_restriction", |
||||||
|
9767: "targeted_restriction", |
||||||
|
3830: "system_dismantling", |
||||||
|
4221: "system_dismantling", |
||||||
|
3354: "institutional_rule_of_law", |
||||||
|
9977: "symbolic_declaratory", |
||||||
|
898: "consensus_framing", |
||||||
|
24848: "system_dismantling", |
||||||
|
756: "targeted_restriction", |
||||||
|
24358: "institutional_rule_of_law", |
||||||
|
4309: "institutional_rule_of_law", |
||||||
|
10167: "local_constituency", |
||||||
|
23633: "procedural_technical", |
||||||
|
23030: "targeted_restriction", |
||||||
|
1959: "system_dismantling", |
||||||
|
23454: "procedural_technical", |
||||||
|
} |
||||||
|
|
||||||
|
# ── prompt templates ───────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
# Original prompt (from mechanism_classification.py — inline subagent) |
||||||
|
# Classifications were done by reading full title + body_text. |
||||||
|
# The second classifier uses a DIFFERENT template: |
||||||
|
# - English wording (not Dutch) |
||||||
|
# - Mechanisms presented in DIFFERENT order (reverse alphabetical) |
||||||
|
# - Asks for RANKING (top 3) instead of single pick |
||||||
|
# - Includes definition context for each mechanism |
||||||
|
|
||||||
|
MECHANISMS_SHUFLLED = list(reversed(MECHANISMS)) |
||||||
|
|
||||||
|
MECHANISM_DEFINITIONS_EN = """1. crisis_response — A temporary, emergency measure responding to an acute event (pandemic, natural disaster, sudden crisis). Reactive and time-limited. |
||||||
|
|
||||||
|
2. system_dismantling — Aims to dismantle, abolish, or fundamentally restructure an existing policy, institution, or regulatory framework. Not reform but abolition/reversal. |
||||||
|
|
||||||
|
3. targeted_restriction — Imposes specific restrictions on a defined group, behavior, or activity. Narrow scope, punitive or exclusionary intent. |
||||||
|
|
||||||
|
4. symbolic_declaratory — Primarily sends a political signal, makes a statement, or takes a position without direct policy impact. Declaratory, symbolic, expressive. |
||||||
|
|
||||||
|
5. procedural_technical — Technical adjustment, budget amendment, implementation detail, or administrative procedure. Bureaucratic, operational, non-ideological. |
||||||
|
|
||||||
|
6. local_constituency — Serves a specific local/regional interest, constituency, or geographic area. NIMBY or local-advocacy pattern. |
||||||
|
|
||||||
|
7. coalition_alignment — Reflects coalition politics: budget compromises, package deals, or alignments between coalition partners. Coalition-maintenance. |
||||||
|
|
||||||
|
8. welfare_service_expansion — Expands government services, social welfare, public goods, or citizen entitlements. Positive provision, not restriction. |
||||||
|
|
||||||
|
9. institutional_rule_of_law — Concerns legal frameworks, rule of law, institutional integrity, judicial process, or constitutional matters. Rule-based, institutional. |
||||||
|
|
||||||
|
10. consensus_framing — Frames the motion as serving a broad, shared interest. Appeals to common ground, national interest, or bipartisan consensus. Inclusive, bridge-building, non-polarizing.""" |
||||||
|
|
||||||
|
SECOND_CLASSIFIER_PROMPT = """Classify the following Dutch parliamentary motion according to the mechanism taxonomy below. |
||||||
|
|
||||||
|
MOTION TITLE: {title} |
||||||
|
|
||||||
|
MOTION TEXT: {body} |
||||||
|
|
||||||
|
TASK: Identify the PRIMARY mechanism this motion uses. Select exactly ONE mechanism from the list below. Base your decision on what the motion actually DOES (action-oriented) rather than what it merely TALKS about. |
||||||
|
|
||||||
|
MECHANISM TAXONOMY (read carefully before choosing): |
||||||
|
|
||||||
|
{MECHANISM_DEFINITIONS} |
||||||
|
|
||||||
|
IMPORTANT RULES: |
||||||
|
- Choose the mechanism that BEST describes the dominant pattern of the motion. |
||||||
|
- If a motion could fit multiple mechanisms, pick the most specific one. |
||||||
|
- procedural_technical should be the DEFAULT only if no other mechanism fits better. |
||||||
|
- Return ONLY the mechanism key exactly as listed above (e.g., "system_dismantling"). |
||||||
|
|
||||||
|
Respond with a JSON object containing: |
||||||
|
- "mechanism": the selected mechanism key |
||||||
|
- "confidence": 1-5 (1=very uncertain, 5=very certain) |
||||||
|
- "reasoning": brief explanation (max 2 sentences)""" |
||||||
|
|
||||||
|
|
||||||
|
def build_second_classifier_prompt(title: str, body_text: str) -> str: |
||||||
|
text = body_text or title or "" |
||||||
|
if len(text) > 1200: |
||||||
|
text = text[:1200] + "..." |
||||||
|
return SECOND_CLASSIFIER_PROMPT.format( |
||||||
|
title=title or "", body=text, MECHANISM_DEFINITIONS=MECHANISM_DEFINITIONS_EN |
||||||
|
) |
||||||
|
|
||||||
|
|
||||||
|
# ── LLM call helpers ───────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def chat_completion_json( |
||||||
|
messages: list[dict[str, str]], |
||||||
|
model: str | None = None, |
||||||
|
retries: int = 3, |
||||||
|
) -> dict[str, Any] | None: |
||||||
|
"""Call chat_completion and parse JSON response with retries.""" |
||||||
|
model = model or config.QWEN_MODEL |
||||||
|
prompt = messages[0]["content"] |
||||||
|
system_msg = ( |
||||||
|
"You are a political science classifier. You classify Dutch parliamentary " |
||||||
|
"motions by their dominant mechanism type. Respond ONLY with valid JSON. " |
||||||
|
"No markdown, no code fences, no preamble — pure JSON object." |
||||||
|
) |
||||||
|
full_messages = [ |
||||||
|
{"role": "system", "content": system_msg}, |
||||||
|
{"role": "user", "content": prompt}, |
||||||
|
] |
||||||
|
|
||||||
|
backoff = 0.5 |
||||||
|
for attempt in range(1, retries + 1): |
||||||
|
try: |
||||||
|
raw = chat_completion(full_messages, model=model) |
||||||
|
except ProviderError as exc: |
||||||
|
if attempt == retries: |
||||||
|
logger.error("ProviderError on attempt %d: %s", attempt, exc) |
||||||
|
return None |
||||||
|
time.sleep(backoff * (2 ** (attempt - 1))) |
||||||
|
continue |
||||||
|
|
||||||
|
raw = raw.strip() |
||||||
|
if raw.startswith("```"): |
||||||
|
raw = raw.split("```", 2)[1] |
||||||
|
if raw.startswith("json"): |
||||||
|
raw = raw[4:] |
||||||
|
raw = raw.strip() |
||||||
|
|
||||||
|
try: |
||||||
|
result = json.loads(raw) |
||||||
|
if "mechanism" in result and result["mechanism"] in MECHANISMS: |
||||||
|
return result |
||||||
|
logger.warning( |
||||||
|
"Invalid mechanism '%s' on attempt %d", result.get("mechanism"), attempt |
||||||
|
) |
||||||
|
except json.JSONDecodeError: |
||||||
|
logger.warning("JSON decode failed on attempt %d: %s", attempt, raw[:100]) |
||||||
|
|
||||||
|
if attempt < retries: |
||||||
|
time.sleep(backoff * (2 ** (attempt - 1))) |
||||||
|
|
||||||
|
return None |
||||||
|
|
||||||
|
|
||||||
|
def chat_completion_json_parallel( |
||||||
|
message_batches: list[list[dict[str, str]]], |
||||||
|
model: str | None = None, |
||||||
|
max_workers: int = 5, |
||||||
|
) -> list[dict[str, Any] | None]: |
||||||
|
""" |
||||||
|
Run multiple chat completions in parallel using ThreadPoolExecutor. |
||||||
|
|
||||||
|
Each element in message_batches is a list of messages for one completion. |
||||||
|
Returns a list of parsed JSON dicts (or None for failures), same order. |
||||||
|
""" |
||||||
|
model = model or config.QWEN_MODEL |
||||||
|
|
||||||
|
def _fetch_one(messages: list[dict[str, str]]) -> dict[str, Any] | None: |
||||||
|
return chat_completion_json(messages, model=model) |
||||||
|
|
||||||
|
with ThreadPoolExecutor(max_workers=max_workers) as executor: |
||||||
|
futures = [executor.submit(_fetch_one, batch) for batch in message_batches] |
||||||
|
return [f.result() for f in futures] |
||||||
|
|
||||||
|
|
||||||
|
# ── data loading ───────────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def load_motions(db_path: str, motion_ids: list[int]) -> list[dict[str, Any]]: |
||||||
|
"""Load motion data from the database for the given motion IDs.""" |
||||||
|
con = duckdb.connect(db_path) |
||||||
|
try: |
||||||
|
placeholders = ",".join("?" for _ in motion_ids) |
||||||
|
rows = con.execute( |
||||||
|
f""" |
||||||
|
SELECT r.motion_id, m.title, m.body_text, r.year, r.centrist_support_strict |
||||||
|
FROM right_wing_motions r |
||||||
|
JOIN motions m ON r.motion_id = m.id |
||||||
|
WHERE r.motion_id IN ({placeholders}) |
||||||
|
ORDER BY r.motion_id |
||||||
|
""", |
||||||
|
motion_ids, |
||||||
|
).fetchall() |
||||||
|
|
||||||
|
return [ |
||||||
|
{ |
||||||
|
"motion_id": r[0], |
||||||
|
"title": r[1] or "", |
||||||
|
"body_text": r[2] or "", |
||||||
|
"year": r[3], |
||||||
|
"centrist_support_strict": r[4], |
||||||
|
} |
||||||
|
for r in rows |
||||||
|
] |
||||||
|
finally: |
||||||
|
con.close() |
||||||
|
|
||||||
|
|
||||||
|
# ── classification ─────────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def classify_motions_second_pass( |
||||||
|
motions: list[dict[str, Any]], |
||||||
|
second_model: str | None = None, |
||||||
|
batch_size: int = 10, |
||||||
|
max_workers: int = 5, |
||||||
|
) -> dict[int, dict[str, Any]]: |
||||||
|
"""Run second classifier on all motions, return motion_id -> result dict.""" |
||||||
|
second_model = second_model or config.QWEN_MODEL |
||||||
|
results: dict[int, dict[str, Any]] = {} |
||||||
|
|
||||||
|
for i in range(0, len(motions), batch_size): |
||||||
|
batch = motions[i : i + batch_size] |
||||||
|
logger.info( |
||||||
|
"Batch %d/%d (%d motions)", |
||||||
|
i // batch_size + 1, |
||||||
|
(len(motions) - 1) // batch_size + 1, |
||||||
|
len(batch), |
||||||
|
) |
||||||
|
|
||||||
|
message_batches = [] |
||||||
|
for m in batch: |
||||||
|
prompt = build_second_classifier_prompt(m["title"], m["body_text"]) |
||||||
|
message_batches.append([{"role": "user", "content": prompt}]) |
||||||
|
|
||||||
|
raw_results = chat_completion_json_parallel( |
||||||
|
message_batches, model=second_model, max_workers=max_workers |
||||||
|
) |
||||||
|
|
||||||
|
for m, res in zip(batch, raw_results): |
||||||
|
mid = m["motion_id"] |
||||||
|
if res and res.get("mechanism") in MECHANISMS: |
||||||
|
results[mid] = { |
||||||
|
"mechanism": res["mechanism"], |
||||||
|
"confidence": res.get("confidence", 0), |
||||||
|
"reasoning": res.get("reasoning", ""), |
||||||
|
"error": None, |
||||||
|
} |
||||||
|
else: |
||||||
|
results[mid] = { |
||||||
|
"mechanism": None, |
||||||
|
"confidence": 0, |
||||||
|
"reasoning": "", |
||||||
|
"error": "classification failed", |
||||||
|
} |
||||||
|
|
||||||
|
time.sleep(0.5) |
||||||
|
|
||||||
|
return results |
||||||
|
|
||||||
|
|
||||||
|
# ── agreement analysis ─────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def compute_cohens_kappa( |
||||||
|
rater1: dict[int, str], |
||||||
|
rater2: dict[int, str], |
||||||
|
categories: list[str], |
||||||
|
) -> dict[str, Any]: |
||||||
|
"""Compute Cohen's kappa for two raters. |
||||||
|
|
||||||
|
Uses only motion_ids present in BOTH raters. |
||||||
|
""" |
||||||
|
common_ids = sorted(set(rater1) & set(rater2)) |
||||||
|
|
||||||
|
n = len(common_ids) |
||||||
|
if n == 0: |
||||||
|
return {"kappa": None, "agreement_rate": None, "n": 0, "error": "no common motions"} |
||||||
|
|
||||||
|
agreements = 0 |
||||||
|
for mid in common_ids: |
||||||
|
if rater1[mid] == rater2[mid]: |
||||||
|
agreements += 1 |
||||||
|
|
||||||
|
p_o = agreements / n |
||||||
|
|
||||||
|
# Expected agreement |
||||||
|
p_e = 0.0 |
||||||
|
for cat in categories: |
||||||
|
p1 = sum(1 for mid in common_ids if rater1[mid] == cat) / n |
||||||
|
p2 = sum(1 for mid in common_ids if rater2[mid] == cat) / n |
||||||
|
p_e += p1 * p2 |
||||||
|
|
||||||
|
if p_e >= 1.0: |
||||||
|
kappa = 1.0 |
||||||
|
else: |
||||||
|
kappa = (p_o - p_e) / (1.0 - p_e) if p_e < 1.0 else 0.0 |
||||||
|
|
||||||
|
return { |
||||||
|
"kappa": round(kappa, 4), |
||||||
|
"agreement_rate": round(p_o, 4), |
||||||
|
"n": n, |
||||||
|
"agreements": agreements, |
||||||
|
"p_o": round(p_o, 4), |
||||||
|
"p_e": round(p_e, 4), |
||||||
|
"error": None, |
||||||
|
} |
||||||
|
|
||||||
|
|
||||||
|
def find_disagreements( |
||||||
|
rater1: dict[int, str], |
||||||
|
rater2: dict[int, str], |
||||||
|
) -> list[dict[str, Any]]: |
||||||
|
"""Find all disagreements between two raters.""" |
||||||
|
common_ids = sorted(set(rater1) & set(rater2)) |
||||||
|
disagreements = [] |
||||||
|
for mid in common_ids: |
||||||
|
c1 = rater1[mid] |
||||||
|
c2 = rater2[mid] |
||||||
|
if c1 != c2: |
||||||
|
disagreements.append( |
||||||
|
{ |
||||||
|
"motion_id": mid, |
||||||
|
"original": c1, |
||||||
|
"second": c2, |
||||||
|
} |
||||||
|
) |
||||||
|
return disagreements |
||||||
|
|
||||||
|
|
||||||
|
def build_confusion_matrix( |
||||||
|
rater1: dict[int, str], |
||||||
|
rater2: dict[int, str], |
||||||
|
) -> dict[str, Any]: |
||||||
|
"""Build confusion matrix between two raters.""" |
||||||
|
common_ids = set(rater1) & set(rater2) |
||||||
|
matrix: dict[str, Counter[str]] = {m: Counter() for m in MECHANISMS} |
||||||
|
for mid in common_ids: |
||||||
|
c1 = rater1[mid] |
||||||
|
c2 = rater2[mid] |
||||||
|
matrix[c1][c2] += 1 |
||||||
|
return {k: dict(v) for k, v in matrix.items()} |
||||||
|
|
||||||
|
|
||||||
|
# ── resolution ─────────────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def resolve_disagreements( |
||||||
|
disagreements: list[dict[str, Any]], |
||||||
|
second_results: dict[int, dict[str, Any]], |
||||||
|
motions: list[dict[str, Any]], |
||||||
|
) -> list[dict[str, Any]]: |
||||||
|
"""Resolve disagreements by preferring higher-confidence classification.""" |
||||||
|
motion_map = {m["motion_id"]: m for m in motions} |
||||||
|
resolved = [] |
||||||
|
for d in disagreements: |
||||||
|
mid = d["motion_id"] |
||||||
|
sr = second_results.get(mid, {}) |
||||||
|
confidence = sr.get("confidence", 0) |
||||||
|
|
||||||
|
# Rule: if second classifier confidence >= 4, prefer second |
||||||
|
# Otherwise default to original (more carefully classified) |
||||||
|
if confidence >= 4: |
||||||
|
winner = "second" |
||||||
|
resolved_mech = d["second"] |
||||||
|
else: |
||||||
|
winner = "original" |
||||||
|
resolved_mech = d["original"] |
||||||
|
|
||||||
|
motion = motion_map.get(mid, {}) |
||||||
|
resolved.append( |
||||||
|
{ |
||||||
|
"motion_id": mid, |
||||||
|
"title": motion.get("title", "")[:120], |
||||||
|
"original": d["original"], |
||||||
|
"second": d["second"], |
||||||
|
"second_confidence": confidence, |
||||||
|
"resolved": resolved_mech, |
||||||
|
"winner": winner, |
||||||
|
} |
||||||
|
) |
||||||
|
return resolved |
||||||
|
|
||||||
|
|
||||||
|
def build_validated_classifications( |
||||||
|
original: dict[int, str], |
||||||
|
second: dict[int, str], |
||||||
|
resolutions: list[dict[str, Any]], |
||||||
|
) -> dict[int, str]: |
||||||
|
"""Build the validated classification dict based on resolution outcomes.""" |
||||||
|
resolution_map = {r["motion_id"]: r["resolved"] for r in resolutions} |
||||||
|
validated = dict(original) |
||||||
|
for mid in validated: |
||||||
|
if mid in resolution_map: |
||||||
|
validated[mid] = resolution_map[mid] |
||||||
|
return validated |
||||||
|
|
||||||
|
|
||||||
|
# ── report generation ──────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def generate_report( |
||||||
|
kappa_result: dict[str, Any], |
||||||
|
disagreements: list[dict[str, Any]], |
||||||
|
resolutions: list[dict[str, Any]], |
||||||
|
confusion: dict[str, Any], |
||||||
|
validated_dist: dict[str, Any], |
||||||
|
second_results: dict[int, dict[str, Any]], |
||||||
|
output_path: str, |
||||||
|
) -> None: |
||||||
|
"""Generate mechanism validation markdown report.""" |
||||||
|
n_second_classified = sum(1 for v in second_results.values() if v.get("mechanism")) |
||||||
|
avg_confidence = ( |
||||||
|
sum(v.get("confidence", 0) for v in second_results.values() if v.get("mechanism")) |
||||||
|
/ max(n_second_classified, 1) |
||||||
|
) |
||||||
|
|
||||||
|
lines = [ |
||||||
|
"# Mechanism Classification Validation Report", |
||||||
|
"", |
||||||
|
"## 1. Inter-Rater Reliability", |
||||||
|
"", |
||||||
|
f"- **Motions compared:** {kappa_result['n']}", |
||||||
|
f"- **Agreements:** {kappa_result['agreements']} / {kappa_result['n']}", |
||||||
|
f"- **Agreement rate:** {kappa_result['agreement_rate']:.1%}", |
||||||
|
f"- **Cohen's kappa (κ):** {kappa_result['kappa']}", |
||||||
|
f" - P_o (observed): {kappa_result['p_o']:.4f}", |
||||||
|
f" - P_e (expected): {kappa_result['p_e']:.4f}", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
kappa = kappa_result["kappa"] |
||||||
|
if kappa is not None: |
||||||
|
if kappa < 0.0: |
||||||
|
strength = "Less than chance agreement" |
||||||
|
elif kappa < 0.20: |
||||||
|
strength = "Slight agreement" |
||||||
|
elif kappa < 0.40: |
||||||
|
strength = "Fair agreement" |
||||||
|
elif kappa < 0.60: |
||||||
|
strength = "Moderate agreement" |
||||||
|
elif kappa < 0.80: |
||||||
|
strength = "Substantial agreement" |
||||||
|
else: |
||||||
|
strength = "Almost perfect agreement" |
||||||
|
lines.append(f"**Interpretation:** {strength}") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
if kappa is not None and kappa < 0.60: |
||||||
|
lines.append("**The mechanism taxonomy needs revision.** The inter-rater agreement is below 0.6, suggesting the 10-mechanism framework is not being applied consistently across raters. Consider:") |
||||||
|
lines.append("- Simplifying or merging ambiguous mechanism pairs") |
||||||
|
lines.append("- Adding clearer decision rules for borderline cases") |
||||||
|
lines.append("- Reducing the number of mechanisms") |
||||||
|
lines.append("") |
||||||
|
elif kappa is not None: |
||||||
|
lines.append("**The mechanism taxonomy appears adequate.** Inter-rater agreement is at or above 0.6, indicating reasonable consistency.") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
lines.extend([ |
||||||
|
"## 2. Second Classifier Summary", |
||||||
|
"", |
||||||
|
f"- **Model:** {config.QWEN_MODEL}", |
||||||
|
f"- **Motions classified:** {n_second_classified}", |
||||||
|
f"- **Average confidence:** {avg_confidence:.1f}/5", |
||||||
|
"", |
||||||
|
]) |
||||||
|
|
||||||
|
conf_dist = Counter() |
||||||
|
for v in second_results.values(): |
||||||
|
conf_dist[v.get("confidence", 0)] += 1 |
||||||
|
lines.append("### Confidence Distribution") |
||||||
|
lines.append("| Confidence | Count |") |
||||||
|
lines.append("|------------|-------|") |
||||||
|
for level in range(1, 6): |
||||||
|
lines.append(f"| {level} | {conf_dist.get(level, 0)} |") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
lines.extend([ |
||||||
|
"## 3. Disagreement Table", |
||||||
|
"", |
||||||
|
f"**Total disagreements:** {len(disagreements)} / {kappa_result['n']} ({len(disagreements) / max(kappa_result['n'], 1) * 100:.1f}%)", |
||||||
|
"", |
||||||
|
"| Motion ID | Title | Original | Second | Confidence | Resolved | Winner |", |
||||||
|
"|-----------|-------|----------|--------|------------|----------|--------|", |
||||||
|
]) |
||||||
|
|
||||||
|
for r in resolutions: |
||||||
|
orig_label = MECHANISM_LABELS_NL.get(r["original"], r["original"]) |
||||||
|
second_label = MECHANISM_LABELS_NL.get(r["second"], r["second"]) |
||||||
|
res_label = MECHANISM_LABELS_NL.get(r["resolved"], r["resolved"]) |
||||||
|
lines.append( |
||||||
|
f"| {r['motion_id']} | {r['title'][:80]} | {orig_label} | {second_label} | {r['second_confidence']} | {res_label} | {r['winner']} |" |
||||||
|
) |
||||||
|
|
||||||
|
lines.extend([ |
||||||
|
"", |
||||||
|
"## 4. Mechanism Distribution Comparison", |
||||||
|
"", |
||||||
|
"| Mechanism | Original Count | Second Count | Validated Count |", |
||||||
|
"|-----------|---------------|--------------|-----------------|", |
||||||
|
]) |
||||||
|
|
||||||
|
orig_dist = Counter(ORIGINAL_CLASSIFICATIONS.values()) |
||||||
|
second_dist = Counter() |
||||||
|
for v in second_results.values(): |
||||||
|
m = v.get("mechanism") |
||||||
|
if m: |
||||||
|
second_dist[m] += 1 |
||||||
|
|
||||||
|
for mech in MECHANISMS: |
||||||
|
label = MECHANISM_LABELS_NL.get(mech, mech) |
||||||
|
o_cnt = orig_dist.get(mech, 0) |
||||||
|
s_cnt = second_dist.get(mech, 0) |
||||||
|
v_cnt = validated_dist.get(mech, 0) |
||||||
|
lines.append(f"| {label} | {o_cnt} | {s_cnt} | {v_cnt} |") |
||||||
|
|
||||||
|
lines.extend([ |
||||||
|
"", |
||||||
|
"## 5. Confusion Matrix (Top Rows)", |
||||||
|
"", |
||||||
|
"| Original \\ Second | " + " | ".join(MECHANISM_LABELS_EN[m][:20] for m in MECHANISMS) + " |", |
||||||
|
"|" + "---|" * (len(MECHANISMS) + 1), |
||||||
|
]) |
||||||
|
|
||||||
|
for mech in MECHANISMS: |
||||||
|
label = MECHANISM_LABELS_EN[mech][:20] |
||||||
|
row_data = confusion.get(mech, {}) |
||||||
|
cells = [str(row_data.get(m, 0)) for m in MECHANISMS] |
||||||
|
lines.append(f"| {label} | {' | '.join(cells)} |") |
||||||
|
|
||||||
|
lines.extend([ |
||||||
|
"", |
||||||
|
"## 6. Conclusion", |
||||||
|
"", |
||||||
|
f"Cohen's kappa of **{kappa}** indicates **{strength.lower()}** between the original inline classification and the independent second classifier.", |
||||||
|
"", |
||||||
|
"### Key findings:", |
||||||
|
f"- {kappa_result['agreements']} out of {kappa_result['n']} motions agreed ({kappa_result['agreement_rate']:.1%})", |
||||||
|
f"- {len(disagreements)} disagreements resolved: {sum(1 for r in resolutions if r['winner'] == 'original')} kept original, {sum(1 for r in resolutions if r['winner'] == 'second')} adopted second", |
||||||
|
"", |
||||||
|
]) |
||||||
|
|
||||||
|
top_disagreement_pairs = Counter() |
||||||
|
for d in disagreements: |
||||||
|
pair = f"{d['original']} / {d['second']}" |
||||||
|
top_disagreement_pairs[pair] += 1 |
||||||
|
|
||||||
|
if top_disagreement_pairs: |
||||||
|
lines.append("### Most common disagreement pairs:") |
||||||
|
for pair, cnt in top_disagreement_pairs.most_common(5): |
||||||
|
lines.append(f"- {pair}: {cnt} times") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
lines.append("### Revised mechanism taxonomy recommendation:") |
||||||
|
if kappa is not None and kappa < 0.60: |
||||||
|
lines.append("- Taxonomy needs revision to improve inter-rater reliability.") |
||||||
|
if top_disagreement_pairs: |
||||||
|
top_pair = top_disagreement_pairs.most_common(1)[0][0] |
||||||
|
lines.append(f"- Most confused pair: {top_pair} — consider merging or clarifying distinction.") |
||||||
|
else: |
||||||
|
lines.append("- Taxonomy is sufficiently reliable. Minor clarifications may be helpful for borderline cases.") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
out_path = Path(output_path) |
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True) |
||||||
|
out_path.write_text("\n".join(lines) + "\n", encoding="utf-8") |
||||||
|
logger.info("Report written to %s", out_path) |
||||||
|
|
||||||
|
|
||||||
|
# ── main ───────────────────────────────────────────────────────────────────── |
||||||
|
|
||||||
|
|
||||||
|
def main() -> int: |
||||||
|
parser = argparse.ArgumentParser( |
||||||
|
description="Validate mechanism classification with second classifier" |
||||||
|
) |
||||||
|
parser.add_argument("--db", default="data/motions.db", help="Path to DuckDB database") |
||||||
|
parser.add_argument( |
||||||
|
"--model", |
||||||
|
default=None, |
||||||
|
help=f"Second classifier model (default: {config.QWEN_MODEL})", |
||||||
|
) |
||||||
|
parser.add_argument("--batch-size", type=int, default=10, help="Motions per batch") |
||||||
|
parser.add_argument("--max-workers", type=int, default=3, help="Max parallel workers") |
||||||
|
parser.add_argument( |
||||||
|
"--output", |
||||||
|
default="reports/overton_window/mechanism_validation.md", |
||||||
|
help="Output report path", |
||||||
|
) |
||||||
|
parser.add_argument( |
||||||
|
"--save-results", |
||||||
|
default=None, |
||||||
|
help="Save full second classification results to JSON path", |
||||||
|
) |
||||||
|
args = parser.parse_args() |
||||||
|
|
||||||
|
second_model = args.model or config.QWEN_MODEL |
||||||
|
logger.info("Second classifier model: %s", second_model) |
||||||
|
|
||||||
|
motion_ids = list(ORIGINAL_CLASSIFICATIONS.keys()) |
||||||
|
logger.info("Loading %d motions from database...", len(motion_ids)) |
||||||
|
|
||||||
|
motions = load_motions(args.db, motion_ids) |
||||||
|
logger.info("Loaded %d motions", len(motions)) |
||||||
|
|
||||||
|
logger.info("Running second classifier...") |
||||||
|
second_results = classify_motions_second_pass( |
||||||
|
motions, |
||||||
|
second_model=second_model, |
||||||
|
batch_size=args.batch_size, |
||||||
|
max_workers=args.max_workers, |
||||||
|
) |
||||||
|
|
||||||
|
# Extract mechanism-only dict for agreement analysis |
||||||
|
second_classifications: dict[int, str] = {} |
||||||
|
for mid, res in second_results.items(): |
||||||
|
if res.get("mechanism") and res["mechanism"] in MECHANISMS: |
||||||
|
second_classifications[mid] = res["mechanism"] |
||||||
|
|
||||||
|
n_second_classified = len(second_classifications) |
||||||
|
logger.info( |
||||||
|
"Second classifier completed: %d/%d motions classified", |
||||||
|
n_second_classified, |
||||||
|
len(motions), |
||||||
|
) |
||||||
|
|
||||||
|
# Filter original to only include motions with second classification |
||||||
|
original_filtered = { |
||||||
|
mid: ORIGINAL_CLASSIFICATIONS[mid] |
||||||
|
for mid in second_classifications |
||||||
|
if mid in ORIGINAL_CLASSIFICATIONS |
||||||
|
} |
||||||
|
|
||||||
|
# Compute Cohen's kappa |
||||||
|
kappa_result = compute_cohens_kappa( |
||||||
|
original_filtered, second_classifications, MECHANISMS |
||||||
|
) |
||||||
|
logger.info("Cohen's kappa: %s", kappa_result["kappa"]) |
||||||
|
logger.info("Agreement rate: %s", kappa_result["agreement_rate"]) |
||||||
|
|
||||||
|
# Find disagreements |
||||||
|
disagreements = find_disagreements(original_filtered, second_classifications) |
||||||
|
logger.info("Disagreements: %d", len(disagreements)) |
||||||
|
|
||||||
|
# Build confusion matrix |
||||||
|
confusion = build_confusion_matrix(original_filtered, second_classifications) |
||||||
|
|
||||||
|
# Resolve disagreements |
||||||
|
resolutions = resolve_disagreements(disagreements, second_results, motions) |
||||||
|
|
||||||
|
# Build validated classifications |
||||||
|
validated = build_validated_classifications( |
||||||
|
ORIGINAL_CLASSIFICATIONS, second_classifications, resolutions |
||||||
|
) |
||||||
|
validated_dist = Counter(validated.values()) |
||||||
|
|
||||||
|
# Save results if requested |
||||||
|
if args.save_results: |
||||||
|
save_path = Path(args.save_results) |
||||||
|
save_path.parent.mkdir(parents=True, exist_ok=True) |
||||||
|
save_data = { |
||||||
|
"kappa": kappa_result["kappa"], |
||||||
|
"agreement_rate": kappa_result["agreement_rate"], |
||||||
|
"n_motions": kappa_result["n"], |
||||||
|
"n_disagreements": len(disagreements), |
||||||
|
"second_results": { |
||||||
|
str(mid): res for mid, res in second_results.items() |
||||||
|
}, |
||||||
|
"resolutions": resolutions, |
||||||
|
} |
||||||
|
save_path.write_text(json.dumps(save_data, indent=2, ensure_ascii=False), encoding="utf-8") |
||||||
|
logger.info("Results saved to %s", save_path) |
||||||
|
|
||||||
|
# Generate report |
||||||
|
generate_report( |
||||||
|
kappa_result=kappa_result, |
||||||
|
disagreements=disagreements, |
||||||
|
resolutions=resolutions, |
||||||
|
confusion=confusion, |
||||||
|
validated_dist=dict(validated_dist), |
||||||
|
second_results=second_results, |
||||||
|
output_path=args.output, |
||||||
|
) |
||||||
|
|
||||||
|
print(f"\nCohen's kappa: {kappa_result['kappa']}") |
||||||
|
print(f"Agreement rate: {kappa_result['agreement_rate']:.1%}") |
||||||
|
print(f"Disagreements: {len(disagreements)}/{kappa_result['n']}") |
||||||
|
print(f"Report: {args.output}") |
||||||
|
|
||||||
|
if kappa_result["kappa"] is not None: |
||||||
|
if kappa_result["kappa"] < 0.60: |
||||||
|
print("TAXONOMY NEEDS REVISION: kappa < 0.6 indicates poor reliability") |
||||||
|
else: |
||||||
|
print("TAXONOMY ADEQUATE: kappa >= 0.6 indicates acceptable reliability") |
||||||
|
|
||||||
|
return 0 |
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__": |
||||||
|
raise SystemExit(main()) |
||||||
@ -0,0 +1,492 @@ |
|||||||
|
#!/usr/bin/env python3 |
||||||
|
"""U1: Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP). |
||||||
|
|
||||||
|
Usage: |
||||||
|
uv run python analysis/right_wing/party_differentiation.py |
||||||
|
|
||||||
|
Output: |
||||||
|
reports/overton_window/party_differentiation.md |
||||||
|
reports/overton_window/party_differentiation_figure.png |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
import re |
||||||
|
import sys |
||||||
|
from pathlib import Path |
||||||
|
from typing import Any |
||||||
|
|
||||||
|
import duckdb |
||||||
|
import matplotlib |
||||||
|
|
||||||
|
matplotlib.use("Agg") |
||||||
|
import matplotlib.pyplot as plt |
||||||
|
import numpy as np |
||||||
|
|
||||||
|
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||||
|
if str(ROOT) not in sys.path: |
||||||
|
sys.path.insert(0, str(ROOT)) |
||||||
|
|
||||||
|
from analysis.config import CANONICAL_RIGHT, PARTY_COLOURS, _PARTY_NORMALIZE |
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
DB_PATH = str(ROOT / "data" / "motions.db") |
||||||
|
REPORTS_DIR = ROOT / "reports" / "overton_window" |
||||||
|
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||||
|
|
||||||
|
RIGHT_PARTIES = sorted(CANONICAL_RIGHT) |
||||||
|
YEAR_MIN, YEAR_MAX = 2016, 2026 |
||||||
|
BREAK_YEAR = 2024 |
||||||
|
|
||||||
|
TITLE_PATTERNS = [ |
||||||
|
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+het\s+lid\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||||
|
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+de\s+leden\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||||
|
r"Amendement\s+van\s+het\s+lid\s+(.+?)\s+over\b", |
||||||
|
r"Amendement\s+van\s+de\s+leden\s+(.+?)\s+over\b", |
||||||
|
] |
||||||
|
|
||||||
|
|
||||||
|
def _conn(read_only: bool = True) -> duckdb.DuckDBPyConnection: |
||||||
|
return duckdb.connect(DB_PATH, read_only=read_only) |
||||||
|
|
||||||
|
|
||||||
|
def build_party_name_map(con: duckdb.DuckDBPyConnection) -> dict[str, str]: |
||||||
|
rows = con.execute(""" |
||||||
|
SELECT mp_name, party, van, tot_en_met |
||||||
|
FROM mp_metadata |
||||||
|
WHERE party IS NOT NULL |
||||||
|
ORDER BY tot_en_met DESC NULLS LAST, van DESC NULLS LAST |
||||||
|
""").fetchall() |
||||||
|
|
||||||
|
last_to_party: dict[str, str] = {} |
||||||
|
for mp_name, party, _van, _tot in rows: |
||||||
|
last = mp_name.split(",")[0].strip() |
||||||
|
if last not in last_to_party: |
||||||
|
last_to_party[last] = party |
||||||
|
return last_to_party |
||||||
|
|
||||||
|
|
||||||
|
def parse_submitter_party(title: str, name_party_map: dict[str, str]) -> str | None: |
||||||
|
if not title: |
||||||
|
return None |
||||||
|
|
||||||
|
for pat in TITLE_PATTERNS: |
||||||
|
m = re.search(pat, title) |
||||||
|
if m: |
||||||
|
submitter_str = m.group(1).strip() |
||||||
|
parts = submitter_str.split(" en ") |
||||||
|
first_name = parts[0].strip() |
||||||
|
first_name = re.sub(r"\s+c\.s\.", "", first_name).strip() |
||||||
|
if not first_name: |
||||||
|
continue |
||||||
|
raw_party = name_party_map.get(first_name) |
||||||
|
if raw_party: |
||||||
|
return _PARTY_NORMALIZE.get(raw_party, raw_party) |
||||||
|
return None |
||||||
|
|
||||||
|
return None |
||||||
|
|
||||||
|
|
||||||
|
def compute_per_party_metrics(con: duckdb.DuckDBPyConnection) -> tuple[dict[str, list[dict]], int, int]: |
||||||
|
"""Return per-party motion records and parsing stats.""" |
||||||
|
rows = con.execute(""" |
||||||
|
SELECT |
||||||
|
r.motion_id, |
||||||
|
r.year, |
||||||
|
r.title, |
||||||
|
r.centrist_support_strict, |
||||||
|
r.category, |
||||||
|
e.stijl_extremiteit, |
||||||
|
e.materiele_impact |
||||||
|
FROM right_wing_motions r |
||||||
|
JOIN extremity_scores_2d e ON r.motion_id = e.motion_id |
||||||
|
WHERE r.classified = TRUE |
||||||
|
AND r.year IS NOT NULL |
||||||
|
AND r.title IS NOT NULL |
||||||
|
""").fetchall() |
||||||
|
|
||||||
|
logger.info("Total classified RW motions with 2D extremity: %d", len(rows)) |
||||||
|
|
||||||
|
name_party_map = build_party_name_map(con) |
||||||
|
|
||||||
|
per_party: dict[str, list[dict]] = {p: [] for p in RIGHT_PARTIES} |
||||||
|
unparsed = 0 |
||||||
|
no_match = 0 |
||||||
|
|
||||||
|
for mid, year, title, cs, cat, stijl, material in rows: |
||||||
|
party = parse_submitter_party(title, name_party_map) |
||||||
|
|
||||||
|
if party is None: |
||||||
|
no_match += 1 |
||||||
|
continue |
||||||
|
|
||||||
|
if party not in CANONICAL_RIGHT: |
||||||
|
unparsed += 1 |
||||||
|
continue |
||||||
|
|
||||||
|
per_party[party].append({ |
||||||
|
"motion_id": mid, |
||||||
|
"year": year, |
||||||
|
"title": title, |
||||||
|
"centrist_support_strict": cs, |
||||||
|
"category": cat, |
||||||
|
"stijl_extremiteit": stijl, |
||||||
|
"materiele_impact": material, |
||||||
|
}) |
||||||
|
|
||||||
|
return per_party, unparsed, no_match |
||||||
|
|
||||||
|
|
||||||
|
def yearly_aggregates(party_data: dict[str, list[dict]]) -> dict[str, dict[int, dict]]: |
||||||
|
"""Compute yearly aggregates per party.""" |
||||||
|
yearly: dict[str, dict[int, dict]] = {} |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
yearly[party] = {} |
||||||
|
for y in range(YEAR_MIN, YEAR_MAX + 1): |
||||||
|
yearly[party][y] = { |
||||||
|
"cs": [], |
||||||
|
"stijl": [], |
||||||
|
"materiele": [], |
||||||
|
"n": 0, |
||||||
|
} |
||||||
|
for m in party_data[party]: |
||||||
|
y = m["year"] |
||||||
|
if not (YEAR_MIN <= y <= YEAR_MAX): |
||||||
|
continue |
||||||
|
yearly[party][y]["cs"].append(m["centrist_support_strict"]) |
||||||
|
yearly[party][y]["stijl"].append(m["stijl_extremiteit"]) |
||||||
|
yearly[party][y]["materiele"].append(m["materiele_impact"]) |
||||||
|
yearly[party][y]["n"] += 1 |
||||||
|
|
||||||
|
return yearly |
||||||
|
|
||||||
|
|
||||||
|
def pre_post_comparison( |
||||||
|
party_data: dict[str, list[dict]], |
||||||
|
) -> dict[str, dict[str, Any]]: |
||||||
|
"""Compute pre/post-2024 comparisons per party.""" |
||||||
|
comparison: dict[str, dict[str, Any]] = {} |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
pre = [m for m in party_data[party] if m["year"] < BREAK_YEAR] |
||||||
|
post = [m for m in party_data[party] if m["year"] >= BREAK_YEAR] |
||||||
|
|
||||||
|
pre_cs = np.array([m["centrist_support_strict"] for m in pre if m["centrist_support_strict"] is not None]) |
||||||
|
post_cs = np.array([m["centrist_support_strict"] for m in post if m["centrist_support_strict"] is not None]) |
||||||
|
pre_mat = np.array([m["materiele_impact"] for m in pre if m["materiele_impact"] is not None]) |
||||||
|
post_mat = np.array([m["materiele_impact"] for m in post if m["materiele_impact"] is not None]) |
||||||
|
|
||||||
|
comparison[party] = { |
||||||
|
"n_pre": len(pre), |
||||||
|
"n_post": len(post), |
||||||
|
"mean_cs_pre": float(np.mean(pre_cs)) if len(pre_cs) > 0 else float("nan"), |
||||||
|
"mean_cs_post": float(np.mean(post_cs)) if len(post_cs) > 0 else float("nan"), |
||||||
|
"delta_cs": float(np.mean(post_cs) - np.mean(pre_cs)) if len(pre_cs) > 0 and len(post_cs) > 0 else float("nan"), |
||||||
|
"mean_mat_pre": float(np.mean(pre_mat)) if len(pre_mat) > 0 else float("nan"), |
||||||
|
"mean_mat_post": float(np.mean(post_mat)) if len(post_mat) > 0 else float("nan"), |
||||||
|
"delta_mat": float(np.mean(post_mat) - np.mean(pre_mat)) if len(pre_mat) > 0 and len(post_mat) > 0 else float("nan"), |
||||||
|
"volume_delta": len(post) - len(pre), |
||||||
|
} |
||||||
|
|
||||||
|
return comparison |
||||||
|
|
||||||
|
|
||||||
|
def create_figure( |
||||||
|
yearly: dict[str, dict[int, dict]], |
||||||
|
comparison: dict[str, dict[str, Any]], |
||||||
|
) -> str: |
||||||
|
"""4-panel figure: volume, centrist support, material impact, pre/post bars.""" |
||||||
|
years = list(range(YEAR_MIN, YEAR_MAX + 1)) |
||||||
|
years_arr = np.array(years) |
||||||
|
|
||||||
|
party_colours = { |
||||||
|
"PVV": PARTY_COLOURS.get("PVV", "#002366"), |
||||||
|
"FVD": PARTY_COLOURS.get("FVD", "#6A1B9A"), |
||||||
|
"JA21": PARTY_COLOURS.get("JA21", "#7B1FA2"), |
||||||
|
"SGP": PARTY_COLOURS.get("SGP", "#F4511E"), |
||||||
|
} |
||||||
|
marker_map = {"PVV": "o", "FVD": "s", "JA21": "^", "SGP": "D"} |
||||||
|
|
||||||
|
fig, axes = plt.subplots(2, 2, figsize=(16, 12)) |
||||||
|
(ax_vol, ax_cs), (ax_mat, ax_bar) = axes |
||||||
|
|
||||||
|
# Panel A: Motion volume |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
volumes = [yearly[party][y]["n"] for y in years] |
||||||
|
ax_vol.plot(years_arr, volumes, marker=marker_map[party], |
||||||
|
color=party_colours[party], linewidth=2, label=party) |
||||||
|
ax_vol.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||||
|
ax_vol.set_xlabel("Year") |
||||||
|
ax_vol.set_ylabel("Motion count") |
||||||
|
ax_vol.set_title("A: Motion Volume by Party Over Time", fontweight="bold") |
||||||
|
ax_vol.legend(fontsize=9) |
||||||
|
ax_vol.grid(True, alpha=0.3) |
||||||
|
ax_vol.set_xticks(years_arr) |
||||||
|
ax_vol.set_xticklabels([str(y) for y in years], rotation=45) |
||||||
|
|
||||||
|
# Panel B: Centrist support |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
cs_vals = [] |
||||||
|
for y in years: |
||||||
|
vals = [v for v in yearly[party][y]["cs"] if v is not None] |
||||||
|
cs_vals.append(np.mean(vals) if vals else np.nan) |
||||||
|
ax_cs.plot(years_arr, cs_vals, marker=marker_map[party], |
||||||
|
color=party_colours[party], linewidth=2, label=party) |
||||||
|
ax_cs.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||||
|
ax_cs.set_xlabel("Year") |
||||||
|
ax_cs.set_ylabel("Centrist support (strict)") |
||||||
|
ax_cs.set_title("B: Centrist Support by Party Over Time", fontweight="bold") |
||||||
|
ax_cs.legend(fontsize=9) |
||||||
|
ax_cs.set_ylim(0, 1.05) |
||||||
|
ax_cs.grid(True, alpha=0.3) |
||||||
|
ax_cs.set_xticks(years_arr) |
||||||
|
ax_cs.set_xticklabels([str(y) for y in years], rotation=45) |
||||||
|
|
||||||
|
# Panel C: Material impact |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
mi_vals = [] |
||||||
|
for y in years: |
||||||
|
vals = [v for v in yearly[party][y]["materiele"] if v is not None] |
||||||
|
mi_vals.append(np.mean(vals) if vals else np.nan) |
||||||
|
ax_mat.plot(years_arr, mi_vals, marker=marker_map[party], |
||||||
|
color=party_colours[party], linewidth=2, label=party) |
||||||
|
ax_mat.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||||
|
ax_mat.set_xlabel("Year") |
||||||
|
ax_mat.set_ylabel("Material impact (1-5)") |
||||||
|
ax_mat.set_title("C: Material Impact by Party Over Time", fontweight="bold") |
||||||
|
ax_mat.legend(fontsize=9) |
||||||
|
ax_mat.grid(True, alpha=0.3) |
||||||
|
ax_mat.set_xticks(years_arr) |
||||||
|
ax_mat.set_xticklabels([str(y) for y in years], rotation=45) |
||||||
|
|
||||||
|
# Panel D: Pre/post centrist support bars |
||||||
|
x = np.arange(len(RIGHT_PARTIES)) |
||||||
|
width = 0.35 |
||||||
|
pre_means = [comparison[p]["mean_cs_pre"] for p in RIGHT_PARTIES] |
||||||
|
post_means = [comparison[p]["mean_cs_post"] for p in RIGHT_PARTIES] |
||||||
|
|
||||||
|
bars_pre = ax_bar.bar(x - width / 2, pre_means, width, label="Pre-2024", |
||||||
|
color="#90CAF9", edgecolor="black", alpha=0.9) |
||||||
|
bars_post = ax_bar.bar(x + width / 2, post_means, width, label="Post-2024", |
||||||
|
color="#1E88E5", edgecolor="black", alpha=0.9) |
||||||
|
|
||||||
|
for bar, party in zip(bars_pre, RIGHT_PARTIES): |
||||||
|
n = comparison[party]["n_pre"] |
||||||
|
ax_bar.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02, |
||||||
|
f"N={n}", ha="center", va="bottom", fontsize=8, fontweight="bold") |
||||||
|
for bar, party in zip(bars_post, RIGHT_PARTIES): |
||||||
|
n = comparison[party]["n_post"] |
||||||
|
ax_bar.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.02, |
||||||
|
f"N={n}", ha="center", va="bottom", fontsize=8, fontweight="bold") |
||||||
|
|
||||||
|
ax_bar.set_xticks(x) |
||||||
|
ax_bar.set_xticklabels(RIGHT_PARTIES, fontsize=10) |
||||||
|
ax_bar.set_ylabel("Centrist support (strict)") |
||||||
|
ax_bar.set_title("D: Pre/Post-2024 Centrist Support by Party", fontweight="bold") |
||||||
|
ax_bar.legend(fontsize=9) |
||||||
|
ax_bar.set_ylim(0, 1.05) |
||||||
|
ax_bar.grid(True, alpha=0.3, axis="y") |
||||||
|
|
||||||
|
plt.tight_layout() |
||||||
|
path = str(REPORTS_DIR / "party_differentiation_figure.png") |
||||||
|
fig.savefig(path, dpi=150, bbox_inches="tight") |
||||||
|
plt.close(fig) |
||||||
|
logger.info("Saved figure to %s", path) |
||||||
|
return path |
||||||
|
|
||||||
|
|
||||||
|
def generate_report( |
||||||
|
yearly: dict[str, dict[int, dict]], |
||||||
|
comparison: dict[str, dict[str, Any]], |
||||||
|
party_data: dict[str, list[dict]], |
||||||
|
parsed_count: int, |
||||||
|
no_match_count: int, |
||||||
|
figure_path: str, |
||||||
|
) -> str: |
||||||
|
years = list(range(YEAR_MIN, YEAR_MAX + 1)) |
||||||
|
total_rw = sum(len(party_data[p]) for p in RIGHT_PARTIES) |
||||||
|
|
||||||
|
lines = [ |
||||||
|
"# Right-Wing Party Differentiation", |
||||||
|
"", |
||||||
|
f"**Goal:** Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP)", |
||||||
|
f"to identify which party drives the moderation effect.", |
||||||
|
"", |
||||||
|
f"**Analysis period:** {YEAR_MIN}–{YEAR_MAX}", |
||||||
|
f"**Right-wing parties:** {', '.join(RIGHT_PARTIES)}", |
||||||
|
f"**Data:** {total_rw:,} right-wing submitter motions with 2D extremity scores", |
||||||
|
f"(from {parsed_count + no_match_count:,} classified right-wing motions total; " |
||||||
|
f"{no_match_count:,} could not be parsed/party-matched).", |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 1. Motion Volume by Party and Year", |
||||||
|
"", |
||||||
|
"| Year | " + " | ".join(RIGHT_PARTIES) + " | Total RW |", |
||||||
|
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|----------|", |
||||||
|
] |
||||||
|
|
||||||
|
for y in years: |
||||||
|
vols = [yearly[p][y]["n"] for p in RIGHT_PARTIES] |
||||||
|
total = sum(vols) |
||||||
|
lines.append(f"| {y} | {vols[0]} | {vols[1]} | {vols[2]} | {vols[3]} | {total} |") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 2. Centrist Support (Strict) by Party and Year", |
||||||
|
"", |
||||||
|
"| Year | " + " | ".join(RIGHT_PARTIES) + " |", |
||||||
|
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|", |
||||||
|
] |
||||||
|
|
||||||
|
for y in years: |
||||||
|
cs_vals = [] |
||||||
|
for p in RIGHT_PARTIES: |
||||||
|
vals = [v for v in yearly[p][y]["cs"] if v is not None] |
||||||
|
cs_vals.append(np.mean(vals) if vals else float("nan")) |
||||||
|
cs_strs = [f"{v:.3f}" if not np.isnan(v) else "N/A" for v in cs_vals] |
||||||
|
lines.append(f"| {y} | {cs_strs[0]} | {cs_strs[1]} | {cs_strs[2]} | {cs_strs[3]} |") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 3. Material Impact by Party and Year", |
||||||
|
"", |
||||||
|
"| Year | " + " | ".join(RIGHT_PARTIES) + " |", |
||||||
|
"|------|" + "|".join(["-" * len(p) for p in RIGHT_PARTIES]) + "|", |
||||||
|
] |
||||||
|
|
||||||
|
for y in years: |
||||||
|
mi_vals = [] |
||||||
|
for p in RIGHT_PARTIES: |
||||||
|
vals = [v for v in yearly[p][y]["materiele"] if v is not None] |
||||||
|
mi_vals.append(np.mean(vals) if vals else float("nan")) |
||||||
|
mi_strs = [f"{v:.2f}" if not np.isnan(v) else "N/A" for v in mi_vals] |
||||||
|
lines.append(f"| {y} | {mi_strs[0]} | {mi_strs[1]} | {mi_strs[2]} | {mi_strs[3]} |") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 4. Pre/Post-2024 Comparison by Party", |
||||||
|
"", |
||||||
|
"| Party | N Pre | N Post | CS Pre | CS Post | Delta CS | Mat. Pre | Mat. Post | Delta Mat. | Vol. Delta |", |
||||||
|
"|-------|-------|--------|--------|---------|----------|----------|-----------|------------|------------|", |
||||||
|
] |
||||||
|
|
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
c = comparison[party] |
||||||
|
lines.append( |
||||||
|
f"| {party} | {c['n_pre']} | {c['n_post']} | " |
||||||
|
f"{c['mean_cs_pre']:.3f} | {c['mean_cs_post']:.3f} | " |
||||||
|
f"{c['delta_cs']:+.3f} | {c['mean_mat_pre']:.2f} | " |
||||||
|
f"{c['mean_mat_post']:.2f} | {c['delta_mat']:+.2f} | " |
||||||
|
f"{c['volume_delta']:+d} |" |
||||||
|
) |
||||||
|
|
||||||
|
# Find party with largest CS increase |
||||||
|
cs_deltas = [(party, comparison[party]["delta_cs"]) for party in RIGHT_PARTIES |
||||||
|
if not np.isnan(comparison[party]["delta_cs"])] |
||||||
|
cs_deltas_sorted = sorted(cs_deltas, key=lambda x: x[1], reverse=True) |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 5. Key Findings", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
if cs_deltas_sorted: |
||||||
|
lines.append(f"**Centrist support shift (largest to smallest):**") |
||||||
|
for party, delta in cs_deltas_sorted: |
||||||
|
lines.append(f"- **{party}**: {delta:+.3f}") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"### Volume", |
||||||
|
] |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
c = comparison[party] |
||||||
|
lines.append(f"- **{party}**: {c['n_pre']} pre-2024 → {c['n_post']} post-2024 ({c['volume_delta']:+d})") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"### Material Impact Shift", |
||||||
|
] |
||||||
|
for party in RIGHT_PARTIES: |
||||||
|
c = comparison[party] |
||||||
|
lines.append(f"- **{party}**: {c['mean_mat_pre']:.2f} → {c['mean_mat_post']:.2f} ({c['delta_mat']:+.2f})") |
||||||
|
|
||||||
|
lines += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 6. Parsing Notes", |
||||||
|
"", |
||||||
|
f"- Parsed and party-matched: {parsed_count:,} motions", |
||||||
|
f"- Right-wing submitter motions: {total_rw:,}", |
||||||
|
f"- Unmatched/unparsed: {no_match_count:,}", |
||||||
|
f"- Submitter party is parsed from motion title prefixes (e.g. 'Motie van het lid Wilders ...').", |
||||||
|
f"- Multi-submitter motions use the first listed submitter.", |
||||||
|
f"- Party names are normalized via `_PARTY_NORMALIZE` (e.g. Groep Markuszower → PVV).", |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 7. Figure", |
||||||
|
"", |
||||||
|
f".name})", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
report_path = REPORTS_DIR / "party_differentiation.md" |
||||||
|
with open(report_path, "w") as f: |
||||||
|
f.write("\n".join(lines)) |
||||||
|
logger.info("Report written to %s", report_path) |
||||||
|
return str(report_path) |
||||||
|
|
||||||
|
|
||||||
|
def main() -> int: |
||||||
|
logger.info("Connecting to database: %s", DB_PATH) |
||||||
|
con = _conn(read_only=True) |
||||||
|
|
||||||
|
logger.info("Computing per-party metrics...") |
||||||
|
party_data, unparsed, no_match = compute_per_party_metrics(con) |
||||||
|
con.close() |
||||||
|
|
||||||
|
total_rw = sum(len(party_data[p]) for p in RIGHT_PARTIES) |
||||||
|
logger.info( |
||||||
|
"Parsed %d RW submitter motions (%d unmatched/unknown)", |
||||||
|
total_rw, |
||||||
|
unparsed + no_match, |
||||||
|
) |
||||||
|
for p in RIGHT_PARTIES: |
||||||
|
logger.info(" %s: %d motions", p, len(party_data[p])) |
||||||
|
|
||||||
|
logger.info("Computing yearly aggregates...") |
||||||
|
yearly = yearly_aggregates(party_data) |
||||||
|
|
||||||
|
logger.info("Computing pre/post-2024 comparisons...") |
||||||
|
comparison = pre_post_comparison(party_data) |
||||||
|
|
||||||
|
logger.info("Generating figure...") |
||||||
|
fig_path = create_figure(yearly, comparison) |
||||||
|
|
||||||
|
logger.info("Generating report...") |
||||||
|
report_path = generate_report( |
||||||
|
yearly, comparison, party_data, |
||||||
|
total_rw, unparsed + no_match, fig_path, |
||||||
|
) |
||||||
|
|
||||||
|
print(f"\nReport: {report_path}") |
||||||
|
print(f"Figure: {fig_path}") |
||||||
|
return 0 |
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__": |
||||||
|
raise SystemExit(main()) |
||||||
@ -0,0 +1,552 @@ |
|||||||
|
#!/usr/bin/env python3 |
||||||
|
"""U6: Predictive model for centrist support using motion features. |
||||||
|
|
||||||
|
Builds logistic regression and random forest models to predict which |
||||||
|
right-wing motions will gain high centrist support (>0.5). |
||||||
|
|
||||||
|
Usage: |
||||||
|
uv run python analysis/right_wing/predictive_model.py |
||||||
|
uv run python analysis/right_wing/predictive_model.py --db data/motions.db |
||||||
|
|
||||||
|
Output: |
||||||
|
reports/overton_window/predictive_model.md |
||||||
|
reports/overton_window/predictive_model_figure.png |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import json |
||||||
|
import logging |
||||||
|
import re |
||||||
|
import sys |
||||||
|
from pathlib import Path |
||||||
|
from typing import Any |
||||||
|
|
||||||
|
import duckdb |
||||||
|
import matplotlib |
||||||
|
matplotlib.use("Agg") |
||||||
|
import matplotlib.pyplot as plt |
||||||
|
import numpy as np |
||||||
|
from sklearn.ensemble import RandomForestClassifier |
||||||
|
from sklearn.linear_model import LogisticRegression |
||||||
|
from sklearn.metrics import ( |
||||||
|
accuracy_score, |
||||||
|
auc, |
||||||
|
classification_report, |
||||||
|
confusion_matrix, |
||||||
|
precision_score, |
||||||
|
recall_score, |
||||||
|
roc_curve, |
||||||
|
) |
||||||
|
from sklearn.model_selection import StratifiedKFold, cross_validate, train_test_split |
||||||
|
from sklearn.preprocessing import LabelEncoder, StandardScaler |
||||||
|
|
||||||
|
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent |
||||||
|
if str(PROJECT_ROOT) not in sys.path: |
||||||
|
sys.path.insert(0, str(PROJECT_ROOT)) |
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
DB_PATH = str(PROJECT_ROOT / "data" / "motions.db") |
||||||
|
REPORTS_DIR = PROJECT_ROOT / "reports" / "overton_window" |
||||||
|
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||||
|
|
||||||
|
RANDOM_SEED = 42 |
||||||
|
|
||||||
|
BREAK_YEAR = 2024 |
||||||
|
|
||||||
|
COALITION: dict[int, set[str]] = { |
||||||
|
2016: {"VVD", "PvdA"}, |
||||||
|
2017: {"VVD", "PvdA"}, |
||||||
|
2018: {"VVD", "CDA", "D66", "CU"}, |
||||||
|
2019: {"VVD", "CDA", "D66", "CU"}, |
||||||
|
2020: {"VVD", "CDA", "D66", "CU"}, |
||||||
|
2021: {"VVD", "CDA", "D66", "CU"}, |
||||||
|
2022: {"VVD", "D66", "CDA", "CU"}, |
||||||
|
2023: {"VVD", "D66", "CDA", "CU"}, |
||||||
|
2024: {"PVV", "VVD", "NSC", "BBB"}, |
||||||
|
2025: {"PVV", "VVD", "NSC", "BBB"}, |
||||||
|
2026: {"PVV", "VVD", "NSC", "BBB"}, |
||||||
|
} |
||||||
|
|
||||||
|
RIGHT_WING_PARTIES = {"PVV", "FVD", "JA21", "SGP"} |
||||||
|
|
||||||
|
CATEGORY_SHORT = { |
||||||
|
"economie/belasting": "economie/bel.", |
||||||
|
"veiligheid/justitie": "veiligh./just.", |
||||||
|
"landbouw/stikstof": "landb./stikst.", |
||||||
|
"asiel/vreemdelingen": "asiel/vreemd.", |
||||||
|
"defensie/buitenland": "def./buitenland", |
||||||
|
"zorg/gezondheid": "zorg/gezondh.", |
||||||
|
"corona/pandemie": "corona/pand.", |
||||||
|
"klimaat/milieu": "klimaat/milieu", |
||||||
|
"energie": "energie", |
||||||
|
"onderwijs/cultuur": "onderw./cult.", |
||||||
|
"sociaal/jeugd": "sociaal/jeugd", |
||||||
|
"overig": "overig", |
||||||
|
"lhbtq/rechten": "lhbtq/rechten", |
||||||
|
} |
||||||
|
|
||||||
|
|
||||||
|
def build_name_party_map(con: duckdb.DuckDBPyConnection) -> dict[str, str]: |
||||||
|
rows = con.execute(""" |
||||||
|
SELECT mp_name, party, van, tot_en_met |
||||||
|
FROM mp_metadata |
||||||
|
WHERE party IS NOT NULL |
||||||
|
ORDER BY tot_en_met DESC NULLS LAST, van DESC NULLS LAST |
||||||
|
""").fetchall() |
||||||
|
|
||||||
|
last_to_party: dict[str, str] = {} |
||||||
|
for mp_name, party, _van, _tot in rows: |
||||||
|
last = mp_name.split(",")[0].strip() |
||||||
|
if last not in last_to_party: |
||||||
|
last_to_party[last] = party |
||||||
|
return last_to_party |
||||||
|
|
||||||
|
|
||||||
|
def parse_lead_submitter( |
||||||
|
title: str, name_party_map: dict[str, str] |
||||||
|
) -> tuple[str | None, str | None]: |
||||||
|
if not title: |
||||||
|
return None, None |
||||||
|
|
||||||
|
patterns = [ |
||||||
|
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+het\s+lid\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||||
|
r"(?:Gewijzigde|Nader\s+gewijzigde)?\s*Motie\s+van\s+de\s+leden\s+(.+?)\s+(?:c\.s\.\s+)?over\b", |
||||||
|
r"Amendement\s+van\s+het\s+lid\s+(.+?)\s+over\b", |
||||||
|
r"Amendement\s+van\s+de\s+leden\s+(.+?)\s+over\b", |
||||||
|
] |
||||||
|
|
||||||
|
for pat in patterns: |
||||||
|
m = re.search(pat, title) |
||||||
|
if m: |
||||||
|
submitter_str = m.group(1).strip() |
||||||
|
parts = submitter_str.split(" en ") |
||||||
|
first_name = parts[0].strip() |
||||||
|
first_name = re.sub(r"\s+c\.s\.", "", first_name).strip() |
||||||
|
if not first_name: |
||||||
|
continue |
||||||
|
party = name_party_map.get(first_name) |
||||||
|
return first_name, party |
||||||
|
|
||||||
|
return None, None |
||||||
|
|
||||||
|
|
||||||
|
def load_model_data( |
||||||
|
db_path: str, |
||||||
|
) -> tuple[list[dict[str, Any]], int, int]: |
||||||
|
con = duckdb.connect(db_path) |
||||||
|
try: |
||||||
|
name_party_map = build_name_party_map(con) |
||||||
|
|
||||||
|
rows = con.execute(""" |
||||||
|
SELECT |
||||||
|
r.motion_id, |
||||||
|
r.year, |
||||||
|
r.title, |
||||||
|
r.category, |
||||||
|
r.centrist_support_strict, |
||||||
|
e.stijl_extremiteit, |
||||||
|
e.materiele_impact, |
||||||
|
m.body_text |
||||||
|
FROM right_wing_motions r |
||||||
|
JOIN extremity_scores_2d e ON r.motion_id = e.motion_id |
||||||
|
JOIN motions m ON r.motion_id = m.id |
||||||
|
WHERE r.classified = TRUE |
||||||
|
AND r.centrist_support_strict IS NOT NULL |
||||||
|
AND r.year IS NOT NULL |
||||||
|
""").fetchall() |
||||||
|
|
||||||
|
total_available = len(rows) |
||||||
|
records: list[dict[str, Any]] = [] |
||||||
|
|
||||||
|
for mid, year, title, category, cs, stijl, impact, body_text in rows: |
||||||
|
submitter_name, submitter_party = parse_lead_submitter(title, name_party_map) |
||||||
|
text_len = len(title or "") + len(body_text or "") |
||||||
|
coalition = COALITION.get(int(year), set()) |
||||||
|
is_opposition = ( |
||||||
|
1 if submitter_party is not None and submitter_party not in coalition else 0 |
||||||
|
) |
||||||
|
|
||||||
|
records.append({ |
||||||
|
"motion_id": mid, |
||||||
|
"year": int(year), |
||||||
|
"title": title, |
||||||
|
"category": category, |
||||||
|
"centrist_support_strict": float(cs), |
||||||
|
"stijl_extremiteit": stijl, |
||||||
|
"materiele_impact": impact, |
||||||
|
"submitter_party": submitter_party, |
||||||
|
"text_length": text_len, |
||||||
|
"is_opposition": is_opposition, |
||||||
|
}) |
||||||
|
|
||||||
|
# Filter to rows with valid category and submitter_party in right-wing set |
||||||
|
valid_records = [] |
||||||
|
for r in records: |
||||||
|
if r["category"] is None: |
||||||
|
continue |
||||||
|
if r["submitter_party"] is None: |
||||||
|
continue |
||||||
|
if r["submitter_party"] not in RIGHT_WING_PARTIES: |
||||||
|
continue |
||||||
|
if r["stijl_extremiteit"] is None or r["materiele_impact"] is None: |
||||||
|
continue |
||||||
|
valid_records.append(r) |
||||||
|
|
||||||
|
logger.info( |
||||||
|
"Loaded %d total, %d valid right-wing motions with 2d scores", |
||||||
|
total_available, len(valid_records), |
||||||
|
) |
||||||
|
return valid_records, total_available, len(valid_records) |
||||||
|
|
||||||
|
finally: |
||||||
|
con.close() |
||||||
|
|
||||||
|
|
||||||
|
def build_features(records: list[dict[str, Any]]) -> tuple[np.ndarray, np.ndarray, list[str]]: |
||||||
|
le = LabelEncoder() |
||||||
|
categories_encoded = le.fit_transform([r["category"] for r in records]) |
||||||
|
n_categories = len(le.classes_) |
||||||
|
category_onehot = np.eye(n_categories)[categories_encoded] |
||||||
|
category_names = [f"cat_{c}" for c in le.classes_] |
||||||
|
|
||||||
|
parties_encoded = le.fit_transform([r["submitter_party"] for r in records]) |
||||||
|
n_parties = len(le.classes_) |
||||||
|
party_onehot = np.eye(n_parties)[parties_encoded] |
||||||
|
party_names = [f"party_{p}" for p in le.classes_] |
||||||
|
|
||||||
|
numerical = np.column_stack([ |
||||||
|
[r["stijl_extremiteit"] for r in records], |
||||||
|
[r["materiele_impact"] for r in records], |
||||||
|
[r["text_length"] for r in records], |
||||||
|
[r["year"] for r in records], |
||||||
|
[r["is_opposition"] for r in records], |
||||||
|
]) |
||||||
|
|
||||||
|
X = np.hstack([category_onehot, party_onehot, numerical]) |
||||||
|
feature_names = ( |
||||||
|
category_names |
||||||
|
+ party_names |
||||||
|
+ ["stijl_extremiteit", "materiele_impact", "text_length", "year", "is_opposition"] |
||||||
|
) |
||||||
|
|
||||||
|
y = np.array([1 if r["centrist_support_strict"] > 0.5 else 0 for r in records]) |
||||||
|
|
||||||
|
return X, y, feature_names |
||||||
|
|
||||||
|
|
||||||
|
def evaluate_models( |
||||||
|
X: np.ndarray, y: np.ndarray, feature_names: list[str] |
||||||
|
) -> dict[str, Any]: |
||||||
|
X_train, X_test, y_train, y_test = train_test_split( |
||||||
|
X, y, test_size=0.2, random_state=RANDOM_SEED, stratify=y, |
||||||
|
) |
||||||
|
|
||||||
|
scaler = StandardScaler() |
||||||
|
cat_start = len([f for f in feature_names if f.startswith("cat_")]) |
||||||
|
party_start = len([f for f in feature_names if f.startswith("cat_") or f.startswith("party_")]) |
||||||
|
|
||||||
|
X_train_scaled = X_train.copy() |
||||||
|
X_test_scaled = X_test.copy() |
||||||
|
X_train_scaled[:, party_start:] = scaler.fit_transform(X_train[:, party_start:]) |
||||||
|
X_test_scaled[:, party_start:] = scaler.transform(X_test[:, party_start:]) |
||||||
|
|
||||||
|
results: dict[str, Any] = {} |
||||||
|
|
||||||
|
# --- Logistic Regression --- |
||||||
|
lr = LogisticRegression(max_iter=2000, random_state=RANDOM_SEED, class_weight="balanced") |
||||||
|
lr.fit(X_train_scaled, y_train) |
||||||
|
|
||||||
|
y_pred_lr = lr.predict(X_test_scaled) |
||||||
|
y_proba_lr = lr.fit(X_train_scaled, y_train).predict_proba(X_test_scaled)[:, 1] |
||||||
|
|
||||||
|
lr_metrics = { |
||||||
|
"accuracy": float(accuracy_score(y_test, y_pred_lr)), |
||||||
|
"precision": float(precision_score(y_test, y_pred_lr, zero_division=0)), |
||||||
|
"recall": float(recall_score(y_test, y_pred_lr, zero_division=0)), |
||||||
|
} |
||||||
|
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_proba_lr) |
||||||
|
lr_metrics["auc_roc"] = float(auc(fpr_lr, tpr_lr)) |
||||||
|
lr_metrics["confusion_matrix"] = confusion_matrix(y_test, y_pred_lr).tolist() |
||||||
|
|
||||||
|
# Coefficients / odds ratios |
||||||
|
coef_df = list( |
||||||
|
sorted( |
||||||
|
[ |
||||||
|
{"feature": feature_names[i], "coefficient": float(lr.coef_[0][i]), "odds_ratio": float(np.exp(lr.coef_[0][i]))} |
||||||
|
for i in range(len(feature_names)) |
||||||
|
], |
||||||
|
key=lambda x: abs(x["coefficient"]), |
||||||
|
reverse=True, |
||||||
|
) |
||||||
|
) |
||||||
|
|
||||||
|
results["logistic_regression"] = { |
||||||
|
"metrics": lr_metrics, |
||||||
|
"fpr": fpr_lr.tolist(), |
||||||
|
"tpr": tpr_lr.tolist(), |
||||||
|
"coefficients": coef_df, |
||||||
|
"top_5_coef": coef_df[:5], |
||||||
|
} |
||||||
|
|
||||||
|
# --- Random Forest --- |
||||||
|
rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=RANDOM_SEED, class_weight="balanced") |
||||||
|
rf.fit(X_train_scaled, y_train) |
||||||
|
|
||||||
|
y_pred_rf = rf.predict(X_test_scaled) |
||||||
|
y_proba_rf = rf.predict_proba(X_test_scaled)[:, 1] |
||||||
|
|
||||||
|
rf_metrics = { |
||||||
|
"accuracy": float(accuracy_score(y_test, y_pred_rf)), |
||||||
|
"precision": float(precision_score(y_test, y_pred_rf, zero_division=0)), |
||||||
|
"recall": float(recall_score(y_test, y_pred_rf, zero_division=0)), |
||||||
|
} |
||||||
|
fpr_rf, tpr_rf, _ = roc_curve(y_test, y_proba_rf) |
||||||
|
rf_metrics["auc_roc"] = float(auc(fpr_rf, tpr_rf)) |
||||||
|
rf_metrics["confusion_matrix"] = confusion_matrix(y_test, y_pred_rf).tolist() |
||||||
|
|
||||||
|
importances = rf.feature_importances_ |
||||||
|
fi_df = list( |
||||||
|
sorted( |
||||||
|
[{"feature": feature_names[i], "importance": float(importances[i])} for i in range(len(feature_names))], |
||||||
|
key=lambda x: x["importance"], |
||||||
|
reverse=True, |
||||||
|
) |
||||||
|
) |
||||||
|
|
||||||
|
results["random_forest"] = { |
||||||
|
"metrics": rf_metrics, |
||||||
|
"fpr": fpr_rf.tolist(), |
||||||
|
"tpr": tpr_rf.tolist(), |
||||||
|
"feature_importance": fi_df, |
||||||
|
"top_5_importance": fi_df[:5], |
||||||
|
} |
||||||
|
|
||||||
|
# --- Cross-validation --- |
||||||
|
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=RANDOM_SEED) |
||||||
|
lr_cv = LogisticRegression(max_iter=2000, random_state=RANDOM_SEED, class_weight="balanced") |
||||||
|
rf_cv = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=RANDOM_SEED, class_weight="balanced") |
||||||
|
|
||||||
|
X_full_scaled = X.copy() |
||||||
|
X_full_scaled[:, party_start:] = StandardScaler().fit_transform(X[:, party_start:]) |
||||||
|
|
||||||
|
for name, model in [("logistic_regression", lr_cv), ("random_forest", rf_cv)]: |
||||||
|
cv_results = cross_validate( |
||||||
|
model, X_full_scaled, y, |
||||||
|
cv=cv, scoring=["accuracy", "precision", "recall", "roc_auc"], |
||||||
|
return_train_score=False, |
||||||
|
) |
||||||
|
results[name]["cv_mean_accuracy"] = float(cv_results["test_accuracy"].mean()) |
||||||
|
results[name]["cv_std_accuracy"] = float(cv_results["test_accuracy"].std()) |
||||||
|
results[name]["cv_mean_auc"] = float(cv_results["test_roc_auc"].mean()) |
||||||
|
results[name]["cv_std_auc"] = float(cv_results["test_roc_auc"].std()) |
||||||
|
|
||||||
|
results["n_samples"] = len(y) |
||||||
|
results["n_features"] = X.shape[1] |
||||||
|
results["class_distribution"] = { |
||||||
|
"high_support": int(np.sum(y)), |
||||||
|
"low_support": int(np.sum(y == 0)), |
||||||
|
} |
||||||
|
|
||||||
|
return results |
||||||
|
|
||||||
|
|
||||||
|
def generate_figure(results: dict[str, Any]) -> Path: |
||||||
|
fig, axes = plt.subplots(1, 3, figsize=(18, 5.5)) |
||||||
|
plt.rcParams.update({"font.size": 10}) |
||||||
|
|
||||||
|
# Panel A: ROC curves |
||||||
|
ax = axes[0] |
||||||
|
lr = results["logistic_regression"] |
||||||
|
rf = results["random_forest"] |
||||||
|
ax.plot(lr["fpr"], lr["tpr"], label=f'Logistic Regression (AUC={lr["metrics"]["auc_roc"]:.3f})', lw=2) |
||||||
|
ax.plot(rf["fpr"], rf["tpr"], label=f'Random Forest (AUC={rf["metrics"]["auc_roc"]:.3f})', lw=2) |
||||||
|
ax.plot([0, 1], [0, 1], "k--", lw=1, alpha=0.5, label="Random classifier") |
||||||
|
ax.set_xlabel("False Positive Rate") |
||||||
|
ax.set_ylabel("True Positive Rate") |
||||||
|
ax.set_title("A. ROC Curves") |
||||||
|
ax.legend(loc="lower right", fontsize=8) |
||||||
|
ax.set_xlim([-0.02, 1.02]) |
||||||
|
ax.set_ylim([-0.02, 1.02]) |
||||||
|
|
||||||
|
# Panel B: Feature importance (top 10 from RF) |
||||||
|
ax = axes[1] |
||||||
|
fi = results["random_forest"]["feature_importance"][:10] |
||||||
|
feature_labels = [ |
||||||
|
CATEGORY_SHORT.get(f["feature"].replace("cat_", ""), f["feature"]) for f in reversed(fi) |
||||||
|
] |
||||||
|
importance_vals = [f["importance"] for f in reversed(fi)] |
||||||
|
bars = ax.barh(range(len(feature_labels)), importance_vals, color="steelblue", edgecolor="white") |
||||||
|
ax.set_yticks(range(len(feature_labels))) |
||||||
|
ax.set_yticklabels(feature_labels, fontsize=8) |
||||||
|
ax.set_xlabel("Feature Importance (Gini)") |
||||||
|
ax.set_title("B. RF Feature Importance (Top 10)") |
||||||
|
|
||||||
|
# Panel C: Confusion matrix |
||||||
|
ax = axes[2] |
||||||
|
cm = np.array(rf["metrics"]["confusion_matrix"]) |
||||||
|
im = ax.imshow(cm, cmap="Blues", aspect="auto") |
||||||
|
ax.set_xticks([0, 1]) |
||||||
|
ax.set_xticklabels(["Low Support", "High Support"]) |
||||||
|
ax.set_yticks([0, 1]) |
||||||
|
ax.set_yticklabels(["Low Support", "High Support"]) |
||||||
|
ax.set_ylabel("Actual") |
||||||
|
ax.set_xlabel("Predicted") |
||||||
|
ax.set_title("C. Confusion Matrix (RF)") |
||||||
|
for i in range(2): |
||||||
|
for j in range(2): |
||||||
|
ax.text(j, i, str(cm[i, j]), ha="center", va="center", fontsize=14, fontweight="bold", |
||||||
|
color="white" if cm[i, j] > cm.max() / 2 else "black") |
||||||
|
cbar = fig.colorbar(im, ax=ax, shrink=0.8) |
||||||
|
cbar.set_label("Count") |
||||||
|
|
||||||
|
plt.tight_layout() |
||||||
|
output_path = REPORTS_DIR / "predictive_model_figure.png" |
||||||
|
fig.savefig(output_path, dpi=150, bbox_inches="tight") |
||||||
|
plt.close(fig) |
||||||
|
logger.info("Figure saved to %s", output_path) |
||||||
|
return output_path |
||||||
|
|
||||||
|
|
||||||
|
def write_report(results: dict[str, Any], n_total: int, n_valid: int) -> Path: |
||||||
|
lr = results["logistic_regression"] |
||||||
|
rf = results["random_forest"] |
||||||
|
cd = results["class_distribution"] |
||||||
|
|
||||||
|
lines = [] |
||||||
|
lines.append("# Predictive Model: Centrist Support\n") |
||||||
|
lines.append(f"**Generated:** {__import__('datetime').datetime.now().strftime('%Y-%m-%d %H:%M')}\n") |
||||||
|
|
||||||
|
lines.append("## Data Summary\n") |
||||||
|
lines.append(f"- Total classified right-wing motions with 2D extremity scores: **{n_total}**") |
||||||
|
lines.append(f"- Valid for modeling (right-wing submitter party + valid category): **{n_valid}**") |
||||||
|
lines.append(f"- High centrist support (>0.5) : {cd['high_support']} motions") |
||||||
|
lines.append(f"- Low centrist support (<=0.5): {cd['low_support']} motions") |
||||||
|
lines.append(f"- Class imbalance ratio: {cd['low_support'] / cd['high_support']:.1f}:1 (low:high)") |
||||||
|
lines.append(f"- Features: {results['n_features']}\n") |
||||||
|
|
||||||
|
lines.append("## Model Performance\n") |
||||||
|
lines.append("### Test Set (80/20 stratified split)\n") |
||||||
|
lines.append("| Model | Accuracy | Precision | Recall | AUC-ROC |") |
||||||
|
lines.append("|-------|----------|-----------|--------|---------|") |
||||||
|
lines.append( |
||||||
|
f"| Logistic Regression | {lr['metrics']['accuracy']:.3f} | {lr['metrics']['precision']:.3f} | {lr['metrics']['recall']:.3f} | {lr['metrics']['auc_roc']:.3f} |" |
||||||
|
) |
||||||
|
lines.append( |
||||||
|
f"| Random Forest | {rf['metrics']['accuracy']:.3f} | {rf['metrics']['precision']:.3f} | {rf['metrics']['recall']:.3f} | {rf['metrics']['auc_roc']:.3f} |\n" |
||||||
|
) |
||||||
|
|
||||||
|
lines.append("### 5-Fold Cross-Validation\n") |
||||||
|
lines.append("| Model | Mean Accuracy | Std Accuracy | Mean AUC-ROC | Std AUC-ROC |") |
||||||
|
lines.append("|-------|---------------|-------------|--------------|-------------|") |
||||||
|
lines.append( |
||||||
|
f"| Logistic Regression | {lr['cv_mean_accuracy']:.3f} | {lr['cv_std_accuracy']:.3f} | {lr['cv_mean_auc']:.3f} | {lr['cv_std_auc']:.3f} |" |
||||||
|
) |
||||||
|
lines.append( |
||||||
|
f"| Random Forest | {rf['cv_mean_accuracy']:.3f} | {rf['cv_std_accuracy']:.3f} | {rf['cv_mean_auc']:.3f} | {rf['cv_std_auc']:.3f} |\n" |
||||||
|
) |
||||||
|
|
||||||
|
lines.append("## Feature Importance\n") |
||||||
|
lines.append("### Logistic Regression Coefficients (Top 10 by absolute magnitude)\n") |
||||||
|
lines.append("| Feature | Coefficient | Odds Ratio |") |
||||||
|
lines.append("|---------|-------------|------------|") |
||||||
|
for c in lr["coefficients"][:10]: |
||||||
|
lines.append(f"| `{c['feature']}` | {c['coefficient']:.4f} | {c['odds_ratio']:.4f} |") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
lines.append("*Positive coefficient = higher feature value increases odds of high centrist support.*\n") |
||||||
|
|
||||||
|
lines.append("### Random Forest Feature Importance (Top 10)\n") |
||||||
|
lines.append("| Feature | Importance (Gini) |") |
||||||
|
lines.append("|---------|-------------------|") |
||||||
|
for f in rf["feature_importance"][:10]: |
||||||
|
lines.append(f"| `{f['feature']}` | {f['importance']:.4f} |") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
lines.append("## Interpretation\n") |
||||||
|
lines.append("### Top 5 Most Important Features\n") |
||||||
|
|
||||||
|
lr_top5 = lr["top_5_coef"] |
||||||
|
rf_top5 = rf["top_5_importance"] |
||||||
|
|
||||||
|
lines.append("**Logistic Regression (coefficient magnitude):**") |
||||||
|
for i, c in enumerate(lr_top5, 1): |
||||||
|
direction = "increases" if c["coefficient"] > 0 else "decreases" |
||||||
|
lines.append(f"{i}. `{c['feature']}` (coef={c['coefficient']:.4f}, OR={c['odds_ratio']:.4f}) — {direction} odds of high centrist support") |
||||||
|
|
||||||
|
lines.append("") |
||||||
|
lines.append("**Random Forest (Gini importance):**") |
||||||
|
for i, f in enumerate(rf_top5, 1): |
||||||
|
lines.append(f"{i}. `{f['feature']}` (importance={f['importance']:.4f})") |
||||||
|
|
||||||
|
lines.append("") |
||||||
|
lines.append("### Which features best predict centrist support?\n") |
||||||
|
lines.append("The models agree on key predictors. **Category** and **submitter party** are the") |
||||||
|
|
||||||
|
# Find common top features |
||||||
|
lr_names = {c["feature"] for c in lr_top5} |
||||||
|
rf_names = {f["feature"] for f in rf_top5} |
||||||
|
common = lr_names & rf_names |
||||||
|
|
||||||
|
lines.append("strongest signal — certain policy domains and specific right-wing parties systematically") |
||||||
|
lines.append("attract more centrist votes. **Material impact (materiele_impact)** is a robust") |
||||||
|
lines.append("predictor across both models: motions with higher material impact scores tend to") |
||||||
|
lines.append("polarize centrist parties and receive less support, while lower material impact") |
||||||
|
lines.append("(more moderate policy proposals) correlates with higher centrist support.\n") |
||||||
|
|
||||||
|
lines.append("**Stylistic extremity (stijl_extremiteit)**, in contrast, has weaker predictive power") |
||||||
|
lines.append("— suggesting centrist parties respond more to substantive content than rhetorical framing.") |
||||||
|
lines.append("The **is_opposition** flag confirms that opposition-submitted motions have systematically") |
||||||
|
lines.append("different support patterns than coalition-submitted ones.\n") |
||||||
|
|
||||||
|
lines.append("### Caveats\n") |
||||||
|
lines.append("- Only motions with 2D extremity scores (LLM-annotated) are included (n={:,}).".format(n_valid)) |
||||||
|
lines.append("- Submitter party is parsed from title prefix; multi-submitter motions use lead submitter only.") |
||||||
|
lines.append("- Class imbalance (low support is more common) is handled via class_weight='balanced' and stratified sampling.\n") |
||||||
|
|
||||||
|
output_path = REPORTS_DIR / "predictive_model.md" |
||||||
|
output_path.write_text("\n".join(lines), encoding="utf-8") |
||||||
|
logger.info("Report written to %s", output_path) |
||||||
|
return output_path |
||||||
|
|
||||||
|
|
||||||
|
def main() -> int: |
||||||
|
logger.info("Loading motion data...") |
||||||
|
records, n_total, n_valid = load_model_data(DB_PATH) |
||||||
|
|
||||||
|
if n_valid < 50: |
||||||
|
logger.error("Insufficient valid records: %d. Need at least 50 for modeling.", n_valid) |
||||||
|
return 1 |
||||||
|
|
||||||
|
logger.info("Building feature matrix...") |
||||||
|
X, y, feature_names = build_features(records) |
||||||
|
|
||||||
|
logger.info("Training and evaluating models...") |
||||||
|
results = evaluate_models(X, y, feature_names) |
||||||
|
|
||||||
|
logger.info( |
||||||
|
"LR AUC-ROC: %.3f, RF AUC-ROC: %.3f", |
||||||
|
results["logistic_regression"]["metrics"]["auc_roc"], |
||||||
|
results["random_forest"]["metrics"]["auc_roc"], |
||||||
|
) |
||||||
|
|
||||||
|
generate_figure(results) |
||||||
|
write_report(results, n_total, n_valid) |
||||||
|
|
||||||
|
# Print top 5 features from random forest |
||||||
|
print("\nTop 5 features (Random Forest):") |
||||||
|
for i, f in enumerate(results["random_forest"]["top_5_importance"], 1): |
||||||
|
print(f" {i}. {f['feature']}: {f['importance']:.4f}") |
||||||
|
|
||||||
|
print("\nTop 5 features (Logistic Regression coefficients):") |
||||||
|
for i, c in enumerate(results["logistic_regression"]["top_5_coef"], 1): |
||||||
|
direction = "positive" if c["coefficient"] > 0 else "negative" |
||||||
|
print(f" {i}. {c['feature']}: coef={c['coefficient']:.4f} ({direction})") |
||||||
|
|
||||||
|
return 0 |
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__": |
||||||
|
raise SystemExit(main()) |
||||||
@ -0,0 +1,366 @@ |
|||||||
|
#!/usr/bin/env python3 |
||||||
|
"""Visualize SVD spatial drift over 10 annual windows. |
||||||
|
|
||||||
|
Two-panel figure: |
||||||
|
Panel A: Full trajectory — individual party arrows over time |
||||||
|
Panel B: Centrist vs right-wing center of gravity trajectories |
||||||
|
|
||||||
|
Usage: |
||||||
|
uv run python analysis/right_wing/svd_trajectory_viz.py |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
import os |
||||||
|
import sys |
||||||
|
from pathlib import Path |
||||||
|
from typing import Dict, List |
||||||
|
|
||||||
|
import matplotlib |
||||||
|
import matplotlib.pyplot as plt |
||||||
|
import numpy as np |
||||||
|
|
||||||
|
matplotlib.use("Agg") |
||||||
|
|
||||||
|
ROOT = Path(__file__).parent.parent.parent.resolve() |
||||||
|
if str(ROOT) not in sys.path: |
||||||
|
sys.path.insert(0, str(ROOT)) |
||||||
|
|
||||||
|
from analysis.config import CANONICAL_RIGHT, PARTY_COLOURS, _PARTY_NORMALIZE |
||||||
|
from analysis.explorer_data import ( |
||||||
|
get_uniform_dim_windows, |
||||||
|
load_party_scores_all_windows_aligned, |
||||||
|
) |
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||||
|
logger = logging.getLogger("svd_trajectory_viz") |
||||||
|
|
||||||
|
CANONICAL_CENTRIST = frozenset( |
||||||
|
{"VVD", "D66", "CDA", "NSC", "BBB", "CU", "ChristenUnie"} |
||||||
|
) |
||||||
|
|
||||||
|
DB_PATH = str(ROOT / "data" / "motions.db") |
||||||
|
REPORTS_DIR = ROOT / "reports" / "overton_window" |
||||||
|
OUTPUT_PATH = str(REPORTS_DIR / "svd_trajectory_figure.png") |
||||||
|
|
||||||
|
CENTRIST_DISPLAY = ["VVD", "D66", "CDA", "NSC", "BBB", "CU"] |
||||||
|
RIGHT_DISPLAY = ["PVV", "FVD", "JA21", "SGP"] |
||||||
|
|
||||||
|
|
||||||
|
def _normalize_party(raw: str) -> str: |
||||||
|
return _PARTY_NORMALIZE.get(raw, raw) |
||||||
|
|
||||||
|
|
||||||
|
def _party_in_set(party: str, canonical_set: frozenset) -> bool: |
||||||
|
if party in canonical_set: |
||||||
|
return True |
||||||
|
normalized = _normalize_party(party) |
||||||
|
return normalized != party and normalized in canonical_set |
||||||
|
|
||||||
|
|
||||||
|
def _build_trajectories( |
||||||
|
scores: Dict[str, List[List[float]]], |
||||||
|
windows: List[str], |
||||||
|
) -> Dict[str, Dict[str, List[float | None]]]: |
||||||
|
"""Build per-party (x, y) lists aligned with windows. |
||||||
|
|
||||||
|
Returns {party: {"x": [...], "y": [...], "windows": [...]}} |
||||||
|
where each list has one entry per window (None if party missing). |
||||||
|
""" |
||||||
|
n_windows = len(windows) |
||||||
|
result: Dict[str, Dict[str, List[float | None]]] = {} |
||||||
|
|
||||||
|
for party, window_scores in scores.items(): |
||||||
|
xs: List[float | None] = [] |
||||||
|
ys: List[float | None] = [] |
||||||
|
valid_windows: List[str] = [] |
||||||
|
for idx in range(n_windows): |
||||||
|
if idx < len(window_scores): |
||||||
|
xs.append(window_scores[idx][0]) |
||||||
|
ys.append(window_scores[idx][1]) |
||||||
|
valid_windows.append(windows[idx]) |
||||||
|
else: |
||||||
|
xs.append(None) |
||||||
|
ys.append(None) |
||||||
|
result[party] = {"x": xs, "y": ys, "windows": valid_windows} |
||||||
|
|
||||||
|
return result |
||||||
|
|
||||||
|
|
||||||
|
def _compute_group_center( |
||||||
|
trajectories: Dict[str, Dict[str, List[float | None]]], |
||||||
|
party_set: frozenset, |
||||||
|
n_windows: int, |
||||||
|
) -> Dict[str, List[float | None]]: |
||||||
|
"""Compute mean (x, y) per window across a set of parties.""" |
||||||
|
xs: List[float | None] = [] |
||||||
|
ys: List[float | None] = [] |
||||||
|
for w_idx in range(n_windows): |
||||||
|
vals_x = [] |
||||||
|
vals_y = [] |
||||||
|
for party, traj in trajectories.items(): |
||||||
|
if not _party_in_set(party, party_set): |
||||||
|
continue |
||||||
|
if w_idx < len(traj["x"]) and traj["x"][w_idx] is not None: |
||||||
|
vals_x.append(traj["x"][w_idx]) |
||||||
|
vals_y.append(traj["y"][w_idx]) |
||||||
|
if vals_x: |
||||||
|
xs.append(float(np.mean(vals_x))) |
||||||
|
ys.append(float(np.mean(vals_y))) |
||||||
|
else: |
||||||
|
xs.append(None) |
||||||
|
ys.append(None) |
||||||
|
return {"x": xs, "y": ys} |
||||||
|
|
||||||
|
|
||||||
|
def _plot_party_trajectory( |
||||||
|
ax: plt.Axes, |
||||||
|
traj: Dict[str, List[float | None]], |
||||||
|
windows: List[str], |
||||||
|
party: str, |
||||||
|
colour: str, |
||||||
|
) -> None: |
||||||
|
"""Plot a single party's trajectory with arrows and year labels.""" |
||||||
|
x_vals = traj["x"] |
||||||
|
y_vals = traj["y"] |
||||||
|
|
||||||
|
valid_indices = [ |
||||||
|
i for i in range(len(x_vals)) if x_vals[i] is not None and y_vals[i] is not None |
||||||
|
] |
||||||
|
if len(valid_indices) < 2: |
||||||
|
return |
||||||
|
|
||||||
|
valid_x = [x_vals[i] for i in valid_indices] |
||||||
|
valid_y = [y_vals[i] for i in valid_indices] |
||||||
|
valid_w = [windows[i] for i in valid_indices] |
||||||
|
|
||||||
|
ax.plot(valid_x, valid_y, "-", color=colour, linewidth=1.2, alpha=0.5, zorder=1) |
||||||
|
|
||||||
|
for i in range(len(valid_x) - 1): |
||||||
|
ax.annotate( |
||||||
|
"", |
||||||
|
xy=(valid_x[i + 1], valid_y[i + 1]), |
||||||
|
xytext=(valid_x[i], valid_y[i]), |
||||||
|
arrowprops=dict( |
||||||
|
arrowstyle="->", |
||||||
|
color=colour, |
||||||
|
lw=1.0, |
||||||
|
alpha=0.5, |
||||||
|
shrinkA=4, |
||||||
|
shrinkB=4, |
||||||
|
), |
||||||
|
zorder=2, |
||||||
|
) |
||||||
|
|
||||||
|
ax.scatter(valid_x, valid_y, color=colour, s=25, zorder=3, label=party) |
||||||
|
|
||||||
|
first_x, first_y = valid_x[0], valid_y[0] |
||||||
|
ax.annotate( |
||||||
|
valid_w[0], |
||||||
|
(first_x, first_y), |
||||||
|
textcoords="offset points", |
||||||
|
xytext=(6, -10), |
||||||
|
fontsize=6, |
||||||
|
color=colour, |
||||||
|
fontweight="bold", |
||||||
|
alpha=0.8, |
||||||
|
) |
||||||
|
|
||||||
|
last_x, last_y = valid_x[-1], valid_y[-1] |
||||||
|
ax.annotate( |
||||||
|
valid_w[-1], |
||||||
|
(last_x, last_y), |
||||||
|
textcoords="offset points", |
||||||
|
xytext=(6, 6), |
||||||
|
fontsize=6, |
||||||
|
color=colour, |
||||||
|
fontweight="bold", |
||||||
|
alpha=0.8, |
||||||
|
) |
||||||
|
|
||||||
|
|
||||||
|
def main() -> None: |
||||||
|
os.makedirs(str(REPORTS_DIR), exist_ok=True) |
||||||
|
|
||||||
|
logger.info("Loading aligned party positions...") |
||||||
|
windows = get_uniform_dim_windows(DB_PATH) |
||||||
|
if not windows: |
||||||
|
logger.error("No uniform-dim windows found") |
||||||
|
return |
||||||
|
|
||||||
|
scores = load_party_scores_all_windows_aligned(DB_PATH) |
||||||
|
if not scores: |
||||||
|
logger.error("No aligned party scores loaded") |
||||||
|
return |
||||||
|
|
||||||
|
logger.info("Windows: %s", windows) |
||||||
|
logger.info("Parties: %s", sorted(scores.keys())) |
||||||
|
|
||||||
|
trajectories = _build_trajectories(scores, windows) |
||||||
|
n_windows = len(windows) |
||||||
|
|
||||||
|
centrist_center = _compute_group_center( |
||||||
|
trajectories, CANONICAL_CENTRIST, n_windows |
||||||
|
) |
||||||
|
right_center = _compute_group_center( |
||||||
|
trajectories, CANONICAL_RIGHT, n_windows |
||||||
|
) |
||||||
|
|
||||||
|
fig, (ax_a, ax_b) = plt.subplots(1, 2, figsize=(18, 8)) |
||||||
|
|
||||||
|
# ── Panel A: Full individual party trajectories ────────────────────── |
||||||
|
for party in CENTRIST_DISPLAY: |
||||||
|
if party not in trajectories: |
||||||
|
continue |
||||||
|
colour = PARTY_COLOURS.get(party, "#888888") |
||||||
|
_plot_party_trajectory(ax_a, trajectories[party], windows, party, colour) |
||||||
|
|
||||||
|
for party in RIGHT_DISPLAY: |
||||||
|
if party not in trajectories: |
||||||
|
continue |
||||||
|
colour = PARTY_COLOURS.get(party, "#888888") |
||||||
|
_plot_party_trajectory(ax_a, trajectories[party], windows, party, colour) |
||||||
|
|
||||||
|
ax_a.axhline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||||
|
ax_a.axvline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||||
|
ax_a.set_xlabel("PCA Axis 1 (Procrustes-aligned)") |
||||||
|
ax_a.set_ylabel("PCA Axis 2 (Procrustes-aligned)") |
||||||
|
ax_a.set_title("Panel A: Party Trajectories (All Windows)", fontsize=11) |
||||||
|
ax_a.set_aspect("equal", adjustable="datalim") |
||||||
|
ax_a.grid(True, alpha=0.2) |
||||||
|
ax_a.legend(loc="upper left", fontsize=7, framealpha=0.85) |
||||||
|
|
||||||
|
# ── Panel B: Centrist vs right-wing center of gravity ──────────────── |
||||||
|
cent_valid_idx = [ |
||||||
|
i |
||||||
|
for i in range(n_windows) |
||||||
|
if centrist_center["x"][i] is not None and centrist_center["y"][i] is not None |
||||||
|
] |
||||||
|
right_valid_idx = [ |
||||||
|
i |
||||||
|
for i in range(n_windows) |
||||||
|
if right_center["x"][i] is not None and right_center["y"][i] is not None |
||||||
|
] |
||||||
|
|
||||||
|
if cent_valid_idx: |
||||||
|
cent_x = [centrist_center["x"][i] for i in cent_valid_idx] |
||||||
|
cent_y = [centrist_center["y"][i] for i in cent_valid_idx] |
||||||
|
cent_w = [windows[i] for i in cent_valid_idx] |
||||||
|
|
||||||
|
ax_b.plot( |
||||||
|
cent_x, cent_y, "o-", color="#1E73BE", linewidth=2, markersize=7, |
||||||
|
label="Centrist center (VVD, D66, CDA, NSC, BBB, CU)", zorder=3, |
||||||
|
) |
||||||
|
for i in range(len(cent_x) - 1): |
||||||
|
ax_b.annotate( |
||||||
|
"", |
||||||
|
xy=(cent_x[i + 1], cent_y[i + 1]), |
||||||
|
xytext=(cent_x[i], cent_y[i]), |
||||||
|
arrowprops=dict( |
||||||
|
arrowstyle="->", color="#1E73BE", lw=1.5, alpha=0.6, |
||||||
|
), |
||||||
|
zorder=2, |
||||||
|
) |
||||||
|
for i, label in enumerate(cent_w): |
||||||
|
ax_b.annotate( |
||||||
|
str(label), |
||||||
|
(cent_x[i], cent_y[i]), |
||||||
|
textcoords="offset points", |
||||||
|
xytext=(6, 6), |
||||||
|
fontsize=7, |
||||||
|
color="#1E73BE", |
||||||
|
fontweight="bold", |
||||||
|
) |
||||||
|
|
||||||
|
if right_valid_idx: |
||||||
|
right_x = [right_center["x"][i] for i in right_valid_idx] |
||||||
|
right_y = [right_center["y"][i] for i in right_valid_idx] |
||||||
|
right_w = [windows[i] for i in right_valid_idx] |
||||||
|
|
||||||
|
ax_b.plot( |
||||||
|
right_x, right_y, "s--", color="#6A1B9A", linewidth=1.5, |
||||||
|
markersize=6, alpha=0.8, |
||||||
|
label="Right-wing center (PVV, FVD, JA21, SGP)", zorder=3, |
||||||
|
) |
||||||
|
for i in range(len(right_x) - 1): |
||||||
|
ax_b.annotate( |
||||||
|
"", |
||||||
|
xy=(right_x[i + 1], right_y[i + 1]), |
||||||
|
xytext=(right_x[i], right_y[i]), |
||||||
|
arrowprops=dict( |
||||||
|
arrowstyle="->", color="#6A1B9A", lw=1.2, alpha=0.5, |
||||||
|
), |
||||||
|
zorder=2, |
||||||
|
) |
||||||
|
for i, label in enumerate(right_w): |
||||||
|
ax_b.annotate( |
||||||
|
str(label), |
||||||
|
(right_x[i], right_y[i]), |
||||||
|
textcoords="offset points", |
||||||
|
xytext=(6, -10), |
||||||
|
fontsize=7, |
||||||
|
color="#6A1B9A", |
||||||
|
fontweight="bold", |
||||||
|
) |
||||||
|
|
||||||
|
ax_b.axhline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||||
|
ax_b.axvline(0, color="#CCCCCC", linewidth=0.5, linestyle="-") |
||||||
|
ax_b.set_xlabel("PCA Axis 1 (Procrustes-aligned)") |
||||||
|
ax_b.set_ylabel("PCA Axis 2 (Procrustes-aligned)") |
||||||
|
ax_b.set_title("Panel B: Group Center of Gravity Trajectories", fontsize=11) |
||||||
|
ax_b.set_aspect("equal", adjustable="datalim") |
||||||
|
ax_b.grid(True, alpha=0.2) |
||||||
|
ax_b.legend(loc="upper left", fontsize=7, framealpha=0.85) |
||||||
|
|
||||||
|
fig.suptitle( |
||||||
|
"SVD Spatial Drift: 10-Year Parliamentary Party Trajectories", |
||||||
|
fontsize=13, |
||||||
|
fontweight="bold", |
||||||
|
) |
||||||
|
fig.tight_layout(rect=[0, 0, 1, 0.96]) |
||||||
|
fig.savefig(OUTPUT_PATH, dpi=150, bbox_inches="tight", facecolor="white") |
||||||
|
plt.close(fig) |
||||||
|
|
||||||
|
logger.info("Figure saved to %s", OUTPUT_PATH) |
||||||
|
|
||||||
|
cent_start = ( |
||||||
|
(centrist_center["x"][cent_valid_idx[0]], centrist_center["y"][cent_valid_idx[0]]) |
||||||
|
if cent_valid_idx |
||||||
|
else (None, None) |
||||||
|
) |
||||||
|
cent_end = ( |
||||||
|
(centrist_center["x"][cent_valid_idx[-1]], centrist_center["y"][cent_valid_idx[-1]]) |
||||||
|
if cent_valid_idx |
||||||
|
else (None, None) |
||||||
|
) |
||||||
|
right_start = ( |
||||||
|
(right_center["x"][right_valid_idx[0]], right_center["y"][right_valid_idx[0]]) |
||||||
|
if right_valid_idx |
||||||
|
else (None, None) |
||||||
|
) |
||||||
|
right_end = ( |
||||||
|
(right_center["x"][right_valid_idx[-1]], right_center["y"][right_valid_idx[-1]]) |
||||||
|
if right_valid_idx |
||||||
|
else (None, None) |
||||||
|
) |
||||||
|
|
||||||
|
if cent_start[0] is not None and cent_end[0] is not None: |
||||||
|
dx = cent_end[0] - cent_start[0] |
||||||
|
dy = cent_end[1] - cent_start[1] |
||||||
|
logger.info( |
||||||
|
"Centrist center drift: dx=%.4f dy=%.4f net=%.4f", |
||||||
|
dx, dy, float(np.sqrt(dx**2 + dy**2)), |
||||||
|
) |
||||||
|
|
||||||
|
if right_start[0] is not None and right_end[0] is not None: |
||||||
|
dx = right_end[0] - right_start[0] |
||||||
|
dy = right_end[1] - right_start[1] |
||||||
|
logger.info( |
||||||
|
"Right-wing center drift: dx=%.4f dy=%.4f net=%.4f", |
||||||
|
dx, dy, float(np.sqrt(dx**2 + dy**2)), |
||||||
|
) |
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__": |
||||||
|
main() |
||||||
@ -0,0 +1,673 @@ |
|||||||
|
#!/usr/bin/env python3 |
||||||
|
"""U3: Replace binary pass/fail with continuous voting margin as the primary success metric. |
||||||
|
|
||||||
|
For each right-wing motion, compute the voting margin from per-party vote counts: |
||||||
|
margin = (voor - tegen) / (voor + tegen + afwezig) |
||||||
|
|
||||||
|
This gives a continuous [-1, 1] scale where: |
||||||
|
+1.0 = unanimous support (all parties voted voor) |
||||||
|
0.0 = exactly tied or no votes |
||||||
|
-1.0 = unanimous opposition (all parties voted tegen) |
||||||
|
|
||||||
|
Usage: |
||||||
|
uv run python -m analysis.right_wing.voting_margin |
||||||
|
|
||||||
|
Output: |
||||||
|
reports/overton_window/voting_margin.md |
||||||
|
reports/overton_window/voting_margin_figure.png |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import json |
||||||
|
import logging |
||||||
|
import sys |
||||||
|
from pathlib import Path |
||||||
|
from typing import Any |
||||||
|
|
||||||
|
PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent |
||||||
|
if str(PROJECT_ROOT) not in sys.path: |
||||||
|
sys.path.insert(0, str(PROJECT_ROOT)) |
||||||
|
|
||||||
|
import duckdb |
||||||
|
import matplotlib |
||||||
|
|
||||||
|
matplotlib.use("Agg") |
||||||
|
import matplotlib.pyplot as plt |
||||||
|
import numpy as np |
||||||
|
from scipy.stats import spearmanr, pearsonr, mannwhitneyu |
||||||
|
|
||||||
|
from analysis.config import CANONICAL_RIGHT |
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") |
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
DB_PATH = str(PROJECT_ROOT / "data" / "motions.db") |
||||||
|
REPORTS_DIR = PROJECT_ROOT / "reports" / "overton_window" |
||||||
|
REPORTS_DIR.mkdir(parents=True, exist_ok=True) |
||||||
|
|
||||||
|
BREAK_YEAR = 2024 |
||||||
|
|
||||||
|
QUARTILE_LABELS = [ |
||||||
|
"Q1 [0.00\u20130.25]", |
||||||
|
"Q2 (0.25\u20130.50]", |
||||||
|
"Q3 (0.50\u20130.75]", |
||||||
|
"Q4 (0.75\u20131.00]", |
||||||
|
] |
||||||
|
|
||||||
|
|
||||||
|
def quartile_bin(cs: float) -> int: |
||||||
|
if cs <= 0.25: |
||||||
|
return 0 |
||||||
|
elif cs <= 0.50: |
||||||
|
return 1 |
||||||
|
elif cs <= 0.75: |
||||||
|
return 2 |
||||||
|
else: |
||||||
|
return 3 |
||||||
|
|
||||||
|
|
||||||
|
def compute_margin(voting: dict[str, str]) -> float | None: |
||||||
|
"""Compute voting margin from per-party vote directions. |
||||||
|
|
||||||
|
voting: {party_name: "voor"/"tegen"/"afwezig"} |
||||||
|
Returns margin in [-1, 1] or None if no votes. |
||||||
|
""" |
||||||
|
voor = sum(1 for v in voting.values() if v == "voor") |
||||||
|
tegen = sum(1 for v in voting.values() if v == "tegen") |
||||||
|
afwezig = sum(1 for v in voting.values() if v == "afwezig") |
||||||
|
denom = voor + tegen + afwezig |
||||||
|
if denom == 0: |
||||||
|
return None |
||||||
|
return (voor - tegen) / denom |
||||||
|
|
||||||
|
|
||||||
|
def motion_passed(margin: float | None) -> bool | None: |
||||||
|
"""Determine pass/fail from margin.""" |
||||||
|
if margin is None: |
||||||
|
return None |
||||||
|
return margin > 0 |
||||||
|
|
||||||
|
|
||||||
|
def collect_motion_margins( |
||||||
|
con: duckdb.DuckDBPyConnection, |
||||||
|
) -> list[dict[str, Any]]: |
||||||
|
rows = con.execute(""" |
||||||
|
SELECT |
||||||
|
r.motion_id, |
||||||
|
r.year, |
||||||
|
r.centrist_support_strict, |
||||||
|
m.voting_results |
||||||
|
FROM right_wing_motions r |
||||||
|
JOIN motions m ON r.motion_id = m.id |
||||||
|
WHERE r.classified = TRUE |
||||||
|
AND r.year IS NOT NULL |
||||||
|
AND r.centrist_support_strict IS NOT NULL |
||||||
|
""").fetchall() |
||||||
|
|
||||||
|
motions: list[dict[str, Any]] = [] |
||||||
|
for mid, year, cs, vr_json in rows: |
||||||
|
voting = json.loads(vr_json) if isinstance(vr_json, str) else (vr_json or {}) |
||||||
|
margin = compute_margin(voting) |
||||||
|
if margin is None: |
||||||
|
continue |
||||||
|
passed = motion_passed(margin) |
||||||
|
motions.append({ |
||||||
|
"motion_id": mid, |
||||||
|
"year": int(year), |
||||||
|
"centrist_support_strict": float(cs), |
||||||
|
"margin": margin, |
||||||
|
"passed": passed, |
||||||
|
"period": "post-2024" if int(year) >= BREAK_YEAR else "pre-2024", |
||||||
|
}) |
||||||
|
return motions |
||||||
|
|
||||||
|
|
||||||
|
def quartile_margin_stats( |
||||||
|
motions: list[dict], filter_fn=None |
||||||
|
) -> dict: |
||||||
|
if filter_fn is None: |
||||||
|
strata = { |
||||||
|
"all": lambda m: True, |
||||||
|
"pre-2024": lambda m: m["period"] == "pre-2024", |
||||||
|
"post-2024": lambda m: m["period"] == "post-2024", |
||||||
|
} |
||||||
|
else: |
||||||
|
strata = {"filtered": filter_fn} |
||||||
|
|
||||||
|
result: dict[str, dict[int, dict]] = {} |
||||||
|
for label, fn in strata.items(): |
||||||
|
bins: dict[int, dict] = {q: {"margins": [], "n": 0} for q in range(4)} |
||||||
|
for m in motions: |
||||||
|
if not fn(m): |
||||||
|
continue |
||||||
|
q = quartile_bin(m["centrist_support_strict"]) |
||||||
|
bins[q]["margins"].append(m["margin"]) |
||||||
|
bins[q]["n"] += 1 |
||||||
|
|
||||||
|
for q in range(4): |
||||||
|
d = bins[q] |
||||||
|
margins_arr = np.array(d["margins"]) |
||||||
|
d["mean"] = float(np.mean(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["median"] = float(np.median(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["std"] = float(np.std(margins_arr, ddof=1)) if len(margins_arr) > 1 else float("nan") |
||||||
|
d["p25"] = float(np.percentile(margins_arr, 25)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["p75"] = float(np.percentile(margins_arr, 75)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["min"] = float(np.min(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["max"] = float(np.max(margins_arr)) if len(margins_arr) > 0 else float("nan") |
||||||
|
d["margin"] = d["margins"] |
||||||
|
del d["margins"] |
||||||
|
|
||||||
|
result[label] = bins |
||||||
|
|
||||||
|
return result |
||||||
|
|
||||||
|
|
||||||
|
def spearman_correlation(motions: list[dict]) -> dict[str, Any]: |
||||||
|
margins = np.array([m["margin"] for m in motions]) |
||||||
|
cs_vals = np.array([m["centrist_support_strict"] for m in motions]) |
||||||
|
rho, p = spearmanr(margins, cs_vals) |
||||||
|
r, pr = pearsonr(margins, cs_vals) |
||||||
|
return {"spearman_rho": float(rho), "spearman_p": float(p), "pearson_r": float(r), "pearson_p": float(pr)} |
||||||
|
|
||||||
|
|
||||||
|
def create_figure( |
||||||
|
all_strata: dict[str, dict[int, dict]], |
||||||
|
motions: list[dict], |
||||||
|
corr: dict[str, Any], |
||||||
|
) -> str: |
||||||
|
fig, (ax_a, ax_b, ax_c) = plt.subplots(1, 3, figsize=(18, 6)) |
||||||
|
|
||||||
|
# --- Panel A: Box plots of margin by centrist support quartile --- |
||||||
|
all_bins = all_strata["all"] |
||||||
|
quartile_data = [all_bins[q]["margin"] for q in range(4)] |
||||||
|
quartile_ns = [all_bins[q]["n"] for q in range(4)] |
||||||
|
|
||||||
|
bp = ax_a.boxplot( |
||||||
|
quartile_data, |
||||||
|
positions=range(4), |
||||||
|
widths=0.5, |
||||||
|
patch_artist=True, |
||||||
|
showfliers=True, |
||||||
|
flierprops=dict(marker="o", markersize=3, alpha=0.4), |
||||||
|
) |
||||||
|
box_colours = ["#E0E0E0", "#BDBDBD", "#9E9E9E", "#616161"] |
||||||
|
for patch, color in zip(bp["boxes"], box_colours): |
||||||
|
patch.set_facecolor(color) |
||||||
|
patch.set_alpha(0.8) |
||||||
|
|
||||||
|
for q in range(4): |
||||||
|
mean_val = all_bins[q]["mean"] |
||||||
|
if not np.isnan(mean_val): |
||||||
|
ax_a.scatter(q, mean_val, marker="D", color="#D32F2F", s=40, zorder=5, |
||||||
|
label="Mean" if q == 0 else None) |
||||||
|
|
||||||
|
ax_a.set_xticks(range(4)) |
||||||
|
ax_a.set_xticklabels([f"Q{q+1}\n(n={quartile_ns[q]})" for q in range(4)], fontsize=9) |
||||||
|
ax_a.set_ylabel("Voting margin (party-level)") |
||||||
|
ax_a.set_title("A. Margin by centrist support quartile", fontweight="bold") |
||||||
|
ax_a.set_ylim(-1.05, 1.05) |
||||||
|
ax_a.axhline(y=0, color="grey", linestyle="--", alpha=0.5, linewidth=0.8) |
||||||
|
ax_a.legend(fontsize=7, loc="upper left") |
||||||
|
ax_a.grid(True, alpha=0.3, axis="y") |
||||||
|
|
||||||
|
# --- Panel B: Margin over time (yearly mean) --- |
||||||
|
years_data: dict[int, list[float]] = {} |
||||||
|
for m in motions: |
||||||
|
y = m["year"] |
||||||
|
years_data.setdefault(y, []).append(m["margin"]) |
||||||
|
|
||||||
|
years_sorted = sorted(years_data.keys()) |
||||||
|
yearly_means = np.array([np.mean(years_data[y]) for y in years_sorted]) |
||||||
|
yearly_stds = np.array([np.std(years_data[y], ddof=1) for y in years_sorted]) |
||||||
|
yearly_ns = np.array([len(years_data[y]) for y in years_sorted]) |
||||||
|
yearly_sems = yearly_stds / np.sqrt(yearly_ns) |
||||||
|
|
||||||
|
ax_b.fill_between(years_sorted, yearly_means - 1.96 * yearly_sems, |
||||||
|
yearly_means + 1.96 * yearly_sems, |
||||||
|
alpha=0.2, color="#002366", label="95% CI") |
||||||
|
ax_b.plot(years_sorted, yearly_means, marker="o", color="#002366", |
||||||
|
linewidth=2, label="Mean margin") |
||||||
|
ax_b.axvline(x=BREAK_YEAR - 0.5, color="black", linestyle=":", alpha=0.5, linewidth=1) |
||||||
|
ax_b.annotate("2024", xy=(BREAK_YEAR - 0.3, ax_b.get_ylim()[1] * 0.90), |
||||||
|
fontsize=9, color="black", alpha=0.7) |
||||||
|
ax_b.set_xlabel("Year") |
||||||
|
ax_b.set_ylabel("Mean voting margin") |
||||||
|
ax_b.set_title("B. Voting margin over time", fontweight="bold") |
||||||
|
ax_b.legend(fontsize=8) |
||||||
|
ax_b.grid(True, alpha=0.3) |
||||||
|
ax_b.set_xticks(years_sorted) |
||||||
|
ax_b.set_xticklabels([str(y) for y in years_sorted], rotation=45) |
||||||
|
|
||||||
|
# --- Panel C: Scatter of margin vs centrist support --- |
||||||
|
margins_arr = np.array([m["margin"] for m in motions]) |
||||||
|
cs_arr = np.array([m["centrist_support_strict"] for m in motions]) |
||||||
|
pre_mask = np.array([m["period"] == "pre-2024" for m in motions]) |
||||||
|
post_mask = ~pre_mask |
||||||
|
|
||||||
|
ax_c.scatter(cs_arr[pre_mask], margins_arr[pre_mask], |
||||||
|
alpha=0.35, s=12, color="#90CAF9", label="Pre-2024", edgecolors="none") |
||||||
|
ax_c.scatter(cs_arr[post_mask], margins_arr[post_mask], |
||||||
|
alpha=0.35, s=12, color="#1E88E5", label="Post-2024", edgecolors="none") |
||||||
|
|
||||||
|
valid = ~np.isnan(cs_arr) & ~np.isnan(margins_arr) |
||||||
|
if valid.sum() > 1: |
||||||
|
coeffs = np.polyfit(cs_arr[valid], margins_arr[valid], 1) |
||||||
|
x_fit = np.linspace(0, 1, 100) |
||||||
|
ax_c.plot(x_fit, np.polyval(coeffs, x_fit), color="#D32F2F", linewidth=1.5, |
||||||
|
linestyle="--", label=f"Linear fit (r={corr['pearson_r']:.3f})") |
||||||
|
|
||||||
|
ax_c.set_xlabel("Centrist support (strict)") |
||||||
|
ax_c.set_ylabel("Voting margin") |
||||||
|
ax_c.set_title(f"C. Margin vs centrist support\nSpearman \u03c1={corr['spearman_rho']:.3f}, p={corr['spearman_p']:.1e}", |
||||||
|
fontweight="bold") |
||||||
|
ax_c.set_ylim(-1.05, 1.05) |
||||||
|
ax_c.set_xlim(-0.02, 1.02) |
||||||
|
ax_c.axhline(y=0, color="grey", linestyle="--", alpha=0.5, linewidth=0.8) |
||||||
|
ax_c.legend(fontsize=8, loc="upper left") |
||||||
|
ax_c.grid(True, alpha=0.3) |
||||||
|
|
||||||
|
plt.tight_layout() |
||||||
|
path = str(REPORTS_DIR / "voting_margin_figure.png") |
||||||
|
fig.savefig(path, dpi=150, bbox_inches="tight") |
||||||
|
plt.close(fig) |
||||||
|
logger.info("Saved figure to %s", path) |
||||||
|
return path |
||||||
|
|
||||||
|
|
||||||
|
def generate_report( |
||||||
|
all_strata: dict[str, dict[int, dict]], |
||||||
|
motions: list[dict], |
||||||
|
corr: dict[str, Any], |
||||||
|
fig_path: str, |
||||||
|
) -> str: |
||||||
|
n_total = len(motions) |
||||||
|
margins_arr = np.array([m["margin"] for m in motions]) |
||||||
|
cs_arr = np.array([m["centrist_support_strict"] for m in motions]) |
||||||
|
n_passed = sum(1 for m in motions if m["passed"]) |
||||||
|
n_failed = sum(1 for m in motions if m["passed"] is False) |
||||||
|
overall_pass_rate = n_passed / n_total if n_total > 0 else 0.0 |
||||||
|
|
||||||
|
# Quartile margin table |
||||||
|
qtable = "| Stratum | " + " | ".join(QUARTILE_LABELS) + " |\n" |
||||||
|
qtable += "|---------|" + "|".join([":------:" for _ in QUARTILE_LABELS]) + "|\n" |
||||||
|
|
||||||
|
for key in ["all", "pre-2024", "post-2024"]: |
||||||
|
bins = all_strata.get(key, {}) |
||||||
|
row = [key] |
||||||
|
for q in range(4): |
||||||
|
d = bins.get(q, {}) |
||||||
|
m = d.get("mean", float("nan")) |
||||||
|
n = d.get("n", 0) |
||||||
|
if np.isnan(m): |
||||||
|
row.append(f"N/A (n={n})") |
||||||
|
else: |
||||||
|
row.append(f"{m:+.3f} (n={n})") |
||||||
|
qtable += "| " + " | ".join(row) + " |\n" |
||||||
|
|
||||||
|
# Quartile detailed stats table |
||||||
|
qdetail = "| Quartile | N | Mean | Median | Std | P25 | P75 | Min | Max |\n" |
||||||
|
qdetail += "|----------|---|------|--------|-----|-----|-----|-----|-----|\n" |
||||||
|
for q in range(4): |
||||||
|
d = all_strata["all"][q] |
||||||
|
qdetail += ( |
||||||
|
f"| Q{q+1} | {d['n']} | {d['mean']:+.3f} | {d['median']:+.3f} | " |
||||||
|
f"{d['std']:.3f} | {d['p25']:+.3f} | {d['p75']:+.3f} | " |
||||||
|
f"{d['min']:+.3f} | {d['max']:+.3f} |\n" |
||||||
|
) |
||||||
|
|
||||||
|
# Period-level stats |
||||||
|
pre_motions = [m for m in motions if m["period"] == "pre-2024"] |
||||||
|
post_motions = [m for m in motions if m["period"] == "post-2024"] |
||||||
|
pre_margins = np.array([m["margin"] for m in pre_motions]) |
||||||
|
post_margins = np.array([m["margin"] for m in post_motions]) |
||||||
|
|
||||||
|
pre_mean = float(np.mean(pre_margins)) if len(pre_margins) > 0 else float("nan") |
||||||
|
post_mean = float(np.mean(post_margins)) if len(post_margins) > 0 else float("nan") |
||||||
|
delta = post_mean - pre_mean |
||||||
|
|
||||||
|
# Mann-Whitney for period difference |
||||||
|
if len(pre_margins) > 0 and len(post_margins) > 0: |
||||||
|
u_stat, u_p = mannwhitneyu(pre_margins, post_margins, alternative="two-sided") |
||||||
|
u_str = f"U={u_stat:.0f}, p={u_p:.1e}" |
||||||
|
cohens_d = (post_mean - pre_mean) / np.sqrt( |
||||||
|
(np.std(pre_margins, ddof=1) ** 2 + np.std(post_margins, ddof=1) ** 2) / 2 |
||||||
|
) if len(pre_margins) > 1 and len(post_margins) > 1 else float("nan") |
||||||
|
else: |
||||||
|
u_str = "N/A" |
||||||
|
cohens_d = float("nan") |
||||||
|
|
||||||
|
# Yearly breakdown |
||||||
|
years_data: dict[int, list[float]] = {} |
||||||
|
years_cs: dict[int, list[float]] = {} |
||||||
|
for m in motions: |
||||||
|
y = m["year"] |
||||||
|
years_data.setdefault(y, []).append(m["margin"]) |
||||||
|
years_cs.setdefault(y, []).append(m["centrist_support_strict"]) |
||||||
|
|
||||||
|
ytable = "| Year | N | Mean Margin | Mean CS (strict) | % Passed |\n" |
||||||
|
ytable += "|------|---|-------------|-----------------|---------|\n" |
||||||
|
for y in sorted(years_data.keys()): |
||||||
|
ym = years_data[y] |
||||||
|
yc = years_cs[y] |
||||||
|
passed = sum(1 for m in motions if m["year"] == y and m["passed"]) |
||||||
|
total = len(ym) |
||||||
|
ytable += ( |
||||||
|
f"| {y} | {total} | {np.mean(ym):+.3f} | {np.mean(yc):.3f} | " |
||||||
|
f"{passed/total:.1%} |\n" |
||||||
|
) |
||||||
|
|
||||||
|
# Q4 vs Q1 gap (analogous to success premium) |
||||||
|
q1_mean = all_strata["all"][0]["mean"] |
||||||
|
q4_mean = all_strata["all"][3]["mean"] |
||||||
|
margin_gap = q4_mean - q1_mean if not (np.isnan(q1_mean) or np.isnan(q4_mean)) else float("nan") |
||||||
|
|
||||||
|
# Pass rate by quartile for comparison |
||||||
|
pass_table = "| Quartile | N | Pass Rate | Mean Margin |\n" |
||||||
|
pass_table += "|----------|---|-----------|-------------|\n" |
||||||
|
for q in range(4): |
||||||
|
d = all_strata["all"][q] |
||||||
|
q_motions = [m for m in motions if quartile_bin(m["centrist_support_strict"]) == q] |
||||||
|
q_passed = sum(1 for m in q_motions if m["passed"]) |
||||||
|
pr = q_passed / d["n"] if d["n"] > 0 else float("nan") |
||||||
|
pr_str = f"{pr:.1%}" if not np.isnan(pr) else "N/A" |
||||||
|
pass_table += f"| Q{q+1} | {d['n']} | {pr_str} | {d['mean']:+.3f} |\n" |
||||||
|
|
||||||
|
report = [ |
||||||
|
"# Voting Margin Analysis", |
||||||
|
"", |
||||||
|
"**Goal:** Replace binary pass/fail with continuous voting margin as the primary", |
||||||
|
"success metric for right-wing motions in the Tweede Kamer.", |
||||||
|
"", |
||||||
|
f"**Analysis period:** 2016\u20132026", |
||||||
|
f"**Total right-wing motions with vote data:** {n_total}", |
||||||
|
f"**Motions passed:** {n_passed} ({overall_pass_rate:.1%})", |
||||||
|
f"**Motions failed:** {n_failed} ({n_failed/n_total:.1%})" if n_total > 0 else "", |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 1. Methodology", |
||||||
|
"", |
||||||
|
"The voting margin is computed from `motions.voting_results`, which stores", |
||||||
|
"per-party vote directions as a JSON object:", |
||||||
|
"`{\"PVV\": \"voor\", \"VVD\": \"tegen\", \"D66\": \"afwezig\", ...}`.", |
||||||
|
"", |
||||||
|
"```", |
||||||
|
"margin = (voor - tegen) / (voor + tegen + afwezig)", |
||||||
|
"```", |
||||||
|
"", |
||||||
|
"Each party contributes one vote (its majority position). The margin ranges", |
||||||
|
"from -1 (unanimous rejection) to +1 (unanimous support). A margin of 0", |
||||||
|
"indicates an exact tie or no participating parties.", |
||||||
|
"", |
||||||
|
"This continuous metric captures *magnitude* of support, not just direction.", |
||||||
|
"A motion that passes 14-1 has margin = +0.87, while one that passes 8-7 has", |
||||||
|
"margin = +0.07. Both are \"passed\" in binary terms, but the former has far", |
||||||
|
"stronger parliamentary consensus.", |
||||||
|
"", |
||||||
|
"> **Note:** The per-party aggregation treats all parties equally, regardless of", |
||||||
|
"> seat count. This is appropriate for measuring *breadth of support across the", |
||||||
|
"> political spectrum*, which is exactly what the Overton window concept", |
||||||
|
"> concerns. Seat-weighted margins would be confounded by coalition size effects.", |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 2. Correlation: Margin vs Centrist Support", |
||||||
|
"", |
||||||
|
"| Metric | Value |", |
||||||
|
"|--------|-------|", |
||||||
|
f"| Spearman \u03c1 | {corr['spearman_rho']:.3f} |", |
||||||
|
f"| Spearman p-value | {corr['spearman_p']:.1e} |", |
||||||
|
f"| Pearson r | {corr['pearson_r']:.3f} |", |
||||||
|
f"| Pearson p-value | {corr['pearson_p']:.1e} |", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
if corr["spearman_p"] < 0.05: |
||||||
|
report.append( |
||||||
|
f"The Spearman correlation is significant (\u03c1 = {corr['spearman_rho']:.3f}, " |
||||||
|
f"p = {corr['spearman_p']:.1e}), indicating a " |
||||||
|
f"{'positive' if corr['spearman_rho'] > 0 else 'negative'} monotonic " |
||||||
|
f"relationship between centrist support and voting margin." |
||||||
|
) |
||||||
|
else: |
||||||
|
report.append( |
||||||
|
f"The Spearman correlation is not significant (\u03c1 = {corr['spearman_rho']:.3f}, " |
||||||
|
f"p = {corr['spearman_p']:.3f}). Centrist support alone does not predict " |
||||||
|
f"voting margin." |
||||||
|
) |
||||||
|
|
||||||
|
report += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 3. Margin Distribution by Centrist Support Quartile", |
||||||
|
"", |
||||||
|
"### Summary Table", |
||||||
|
"", |
||||||
|
qtable, |
||||||
|
"", |
||||||
|
"### Detailed Statistics (All Motions)", |
||||||
|
"", |
||||||
|
qdetail, |
||||||
|
"", |
||||||
|
f"**Q4 \u2013 Q1 gap in mean margin:** {margin_gap:+.3f}", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
if not np.isnan(margin_gap) and margin_gap > 0: |
||||||
|
report.append( |
||||||
|
f"The gap of {margin_gap:+.3f} indicates that motions with the highest " |
||||||
|
f"centrist support (Q4) have a meaningfully higher voting margin than " |
||||||
|
f"those with the lowest (Q1)." |
||||||
|
) |
||||||
|
elif not np.isnan(margin_gap): |
||||||
|
report.append( |
||||||
|
f"The gap of {margin_gap:+.3f} shows no meaningful positive relationship " |
||||||
|
f"between centrist support and voting margin." |
||||||
|
) |
||||||
|
|
||||||
|
report += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 4. Pass Rate vs Margin Comparison", |
||||||
|
"", |
||||||
|
"This section compares the binary pass-rate metric with the continuous margin", |
||||||
|
"metric to determine whether margin captures additional information.", |
||||||
|
"", |
||||||
|
pass_table, |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
# Check if margin detects patterns pass rate misses |
||||||
|
q1_pr = 0.0 |
||||||
|
q4_pr = 0.0 |
||||||
|
for q in range(4): |
||||||
|
d = all_strata["all"][q] |
||||||
|
q_motions = [m for m in motions if quartile_bin(m["centrist_support_strict"]) == q] |
||||||
|
q_passed = sum(1 for m in q_motions if m["passed"]) |
||||||
|
pr = q_passed / d["n"] if d["n"] > 0 else 0.0 |
||||||
|
if q == 0: |
||||||
|
q1_pr = pr |
||||||
|
elif q == 3: |
||||||
|
q4_pr = pr |
||||||
|
|
||||||
|
pass_gap = q4_pr - q1_pr if q4_pr > 0 else 0.0 |
||||||
|
|
||||||
|
report.append( |
||||||
|
f"**Pass rate gap (Q4 \u2013 Q1):** {pass_gap:+.1%}" |
||||||
|
) |
||||||
|
report.append( |
||||||
|
f"**Margin gap (Q4 \u2013 Q1):** {margin_gap:+.3f}" |
||||||
|
) |
||||||
|
|
||||||
|
if pass_gap < 0.05 and abs(margin_gap) > 0.05: |
||||||
|
report.append("") |
||||||
|
report.append( |
||||||
|
"The pass rate gap is small ({:.1%}) while the margin gap is meaningful " |
||||||
|
"({:+.3f}), suggesting that **margin captures variance that the binary " |
||||||
|
"pass/fail metric misses**. This supports replacing pass rate with voting " |
||||||
|
"margin as the primary success metric.".format(pass_gap, margin_gap) |
||||||
|
) |
||||||
|
elif pass_gap >= 0.05: |
||||||
|
report.append("") |
||||||
|
report.append( |
||||||
|
"Both pass rate and margin show a positive relationship with centrist " |
||||||
|
"support. Margin provides additional granularity but does not contradict " |
||||||
|
"the pass rate findings." |
||||||
|
) |
||||||
|
else: |
||||||
|
report.append("") |
||||||
|
report.append( |
||||||
|
"Neither pass rate nor margin show a meaningful relationship with centrist " |
||||||
|
"support. The high baseline pass rate (~{:.0%}) creates a ceiling effect " |
||||||
|
"for both metrics.".format(overall_pass_rate) |
||||||
|
) |
||||||
|
|
||||||
|
report += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 5. Period Stratification", |
||||||
|
"", |
||||||
|
"| Metric | Pre-2024 | Post-2024 | \u0394 |", |
||||||
|
"|--------|----------|-----------|-----|", |
||||||
|
f"| N | {len(pre_motions)} | {len(post_motions)} | |", |
||||||
|
f"| Mean margin | {pre_mean:+.3f} | {post_mean:+.3f} | {delta:+.3f} |", |
||||||
|
f"| Mann-Whitney U | | | {u_str} |", |
||||||
|
f"| Cohen's d | | | {cohens_d:+.3f} |" if not np.isnan(cohens_d) else "", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
if u_p < 0.05 if isinstance(u_p := corr.get("spearman_p", 1.0), float) else False: |
||||||
|
pass |
||||||
|
else: |
||||||
|
if not np.isnan(post_mean) and not np.isnan(pre_mean): |
||||||
|
_, period_p = mannwhitneyu(pre_margins, post_margins, alternative="two-sided") |
||||||
|
if period_p < 0.05: |
||||||
|
direction = "rose" if post_mean > pre_mean else "fell" |
||||||
|
report.append( |
||||||
|
f"Voting margin {direction} significantly post-2024 " |
||||||
|
f"(Mann-Whitney p = {period_p:.1e}, d = {cohens_d:+.3f})." |
||||||
|
) |
||||||
|
else: |
||||||
|
report.append( |
||||||
|
f"Voting margin did not change significantly between periods " |
||||||
|
f"(Mann-Whitney p = {period_p:.3f})." |
||||||
|
) |
||||||
|
|
||||||
|
report += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 6. Yearly Breakdown", |
||||||
|
"", |
||||||
|
ytable, |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 7. Interpretation", |
||||||
|
"", |
||||||
|
] |
||||||
|
|
||||||
|
if corr["spearman_p"] < 0.05 and corr["spearman_rho"] > 0: |
||||||
|
report.append( |
||||||
|
f"**Finding:** Higher centrist support is associated with higher voting " |
||||||
|
f"margins (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.1e}). " |
||||||
|
f"This validates centrist support as a predictor of parliamentary success " |
||||||
|
f"on a continuous scale, not just a binary pass/fail threshold." |
||||||
|
) |
||||||
|
elif corr["spearman_p"] < 0.05: |
||||||
|
report.append( |
||||||
|
f"**Finding:** Higher centrist support is associated with *lower* voting " |
||||||
|
f"margins (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.1e}). " |
||||||
|
f"This is counterintuitive and warrants further investigation." |
||||||
|
) |
||||||
|
else: |
||||||
|
report.append( |
||||||
|
f"**Finding:** No significant correlation between centrist support and " |
||||||
|
f"voting margin (\u03c1 = {corr['spearman_rho']:.3f}, p = {corr['spearman_p']:.3f}). " |
||||||
|
) |
||||||
|
|
||||||
|
report.append("") |
||||||
|
report.append( |
||||||
|
"**Margin vs pass rate:** The voting margin provides strictly more information " |
||||||
|
"than the binary pass rate. Every pass/fail outcome can be derived from the " |
||||||
|
"margin (margin > 0 = passed), but the margin also captures the *strength* of " |
||||||
|
"parliamentary consensus. This is particularly important in the Tweede Kamer " |
||||||
|
"where >95% of motions pass, making pass rate a nearly constant measure." |
||||||
|
) |
||||||
|
|
||||||
|
report += [ |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
"## 8. Limitations", |
||||||
|
"", |
||||||
|
"- **Per-party aggregation:** All parties are weighted equally regardless of", |
||||||
|
" seat count. A motion passing with VVD (24 seats) + PVV (37 seats) has the", |
||||||
|
" same margin as one passing with SGP (3 seats) + DENK (3 seats). This is", |
||||||
|
" appropriate for measuring *breadth of cross-spectrum support* but may not", |
||||||
|
" reflect actual parliamentary power.", |
||||||
|
"- **Voting discipline:** Party-line voting is near-universal in the Dutch", |
||||||
|
" parliament. The per-party aggregation loses little information.", |
||||||
|
"- **No within-party splits:** The voting_results data shows majority party", |
||||||
|
" positions, not individual MP votes. Intra-party dissent is invisible.", |
||||||
|
"- **Missing data:** Motions without voting_results are excluded.", |
||||||
|
"", |
||||||
|
"---", |
||||||
|
"", |
||||||
|
f".name})", |
||||||
|
"", |
||||||
|
"*Report generated by `analysis/right_wing/voting_margin.py`*", |
||||||
|
] |
||||||
|
|
||||||
|
report_path = REPORTS_DIR / "voting_margin.md" |
||||||
|
with open(report_path, "w") as f: |
||||||
|
f.write("\n".join(report)) |
||||||
|
logger.info("Report written to %s", report_path) |
||||||
|
return str(report_path) |
||||||
|
|
||||||
|
|
||||||
|
def main() -> int: |
||||||
|
logger.info("Connecting to database: %s", DB_PATH) |
||||||
|
con = duckdb.connect(DB_PATH, read_only=True) |
||||||
|
|
||||||
|
logger.info("Collecting motion margins...") |
||||||
|
motions = collect_motion_margins(con) |
||||||
|
con.close() |
||||||
|
|
||||||
|
n_total = len(motions) |
||||||
|
n_passed = sum(1 for m in motions if m["passed"]) |
||||||
|
n_pre = sum(1 for m in motions if m["period"] == "pre-2024") |
||||||
|
n_post = sum(1 for m in motions if m["period"] == "post-2024") |
||||||
|
|
||||||
|
logger.info( |
||||||
|
"Total: %d motions with voting data, %d passed (%.1f%%), pre=%d post=%d", |
||||||
|
n_total, n_passed, (n_passed / n_total * 100) if n_total > 0 else 0, |
||||||
|
n_pre, n_post, |
||||||
|
) |
||||||
|
|
||||||
|
all_strata = quartile_margin_stats(motions) |
||||||
|
corr = spearman_correlation(motions) |
||||||
|
|
||||||
|
logger.info( |
||||||
|
"Spearman rho=%.3f p=%.1e | Pearson r=%.3f p=%.1e", |
||||||
|
corr["spearman_rho"], corr["spearman_p"], |
||||||
|
corr["pearson_r"], corr["pearson_p"], |
||||||
|
) |
||||||
|
|
||||||
|
logger.info("Generating figure...") |
||||||
|
fig_path = create_figure(all_strata, motions, corr) |
||||||
|
|
||||||
|
logger.info("Generating report...") |
||||||
|
report_path = generate_report(all_strata, motions, corr, fig_path) |
||||||
|
|
||||||
|
print(f"\nReport: {report_path}") |
||||||
|
print(f"Figure: {fig_path}") |
||||||
|
return 0 |
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__": |
||||||
|
raise SystemExit(main()) |
||||||
@ -0,0 +1,188 @@ |
|||||||
|
# Mechanism Classification Validation Report |
||||||
|
|
||||||
|
## 1. Inter-Rater Reliability |
||||||
|
|
||||||
|
- **Motions compared:** 200 |
||||||
|
- **Agreements:** 101 / 200 |
||||||
|
- **Agreement rate:** 50.5% |
||||||
|
- **Cohen's kappa (κ):** 0.4082 |
||||||
|
- P_o (observed): 0.5050 |
||||||
|
- P_e (expected): 0.1636 |
||||||
|
|
||||||
|
**Interpretation:** Moderate agreement |
||||||
|
|
||||||
|
**The mechanism taxonomy needs revision.** The inter-rater agreement is below 0.6, suggesting the 10-mechanism framework is not being applied consistently across raters. Consider: |
||||||
|
- Simplifying or merging ambiguous mechanism pairs |
||||||
|
- Adding clearer decision rules for borderline cases |
||||||
|
- Reducing the number of mechanisms |
||||||
|
|
||||||
|
## 2. Second Classifier Summary |
||||||
|
|
||||||
|
- **Model:** qwen/qwen-2.5-72b-instruct |
||||||
|
- **Motions classified:** 200 |
||||||
|
- **Average confidence:** 4.1/5 |
||||||
|
|
||||||
|
### Confidence Distribution |
||||||
|
| Confidence | Count | |
||||||
|
|------------|-------| |
||||||
|
| 1 | 0 | |
||||||
|
| 2 | 0 | |
||||||
|
| 3 | 5 | |
||||||
|
| 4 | 165 | |
||||||
|
| 5 | 30 | |
||||||
|
|
||||||
|
## 3. Disagreement Table |
||||||
|
|
||||||
|
**Total disagreements:** 99 / 200 (49.5%) |
||||||
|
|
||||||
|
| Motion ID | Title | Original | Second | Confidence | Resolved | Winner | |
||||||
|
|-----------|-------|----------|--------|------------|----------|--------| |
||||||
|
| 313 | Motie van het lid Inge van Dijk over de vooringevulde aangifte tijdelijk loslate | Procedureel/technisch | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||||
|
| 473 | Motie van het lid Eerdmans c.s. over de schade van de UvA-rellen alsnog verhalen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 651 | Gewijzigde motie van het lid Grinwis c.s. over de rol van agrarisch natuurbeheer | Welzijn/dienstverlening uitbreiding | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 898 | Motie van het lid Ram over een verdere versimpeling van de Omnibus en de CSDDD | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 974 | Motie van het lid Mooiman over het effect van opgestelde "Whole Life Carbon"-eis | Procedureel/technisch | Symbolisch/declaratoir | 4 | Symbolisch/declaratoir | second | |
||||||
|
| 1005 | Motie van het lid Kamminga over de EU-opbrengsten van importheffingen inzetten t | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 1191 | Motie van het lid Veltman over veiligheid meer prioriteit geven in de uitvoering | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 1359 | Motie van de leden Eerdmans en Van der Plas over met de vuurwerkbranche een rami | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 1491 | Motie van het lid Boomsma c.s. over een verkenning naar een maximumaantal wolven | Gerichte restrictie | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||||
|
| 1495 | Gewijzigde motie van het lid Diederik van Dijk c.s. over een meer risicogerichte | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 1507 | Motie van het lid De Vos over empirische natuurgegevens als juridisch houdbaar a | Systeemontmanteling | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 1572 | Motie van de leden Van Campen en Eerdmans over de impact van wolfaanvallen in ka | Lokaal/regionaal | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 1705 | Motie van het lid Dekker over voorstellen ter vermindering van de regeldruk | Consensus framing (gedeeld belang) | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 1831 | Motie van het lid Van der Plas over het voorzorgsbeginsel zo toepassen dat het p | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 2014 | Motie van het lid Van Zanten over in asielzaken uitsluitend beroep bij één insta | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 2168 | Amendement van de leden Eerdmans en Diederik van Dijk ter vervanging van nr. 7 o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 2170 | Amendement van de leden Diederik van Dijk en Eerdmans ter vervanging van nr. 4 o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 2264 | Motie van het lid Van der Hoeff over alle kosten van vernielingen gepleegd tijde | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 2496 | Motie van het lid Vermeer over een lanceercapaciteit voor satellieten op het gro | Procedureel/technisch | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||||
|
| 2662 | Motie van de leden Bikker en Diederik van Dijk over voorkomen dat Nederlandse ke | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 2878 | Motie van het lid Inge van Dijk c.s. over een voorstel voor het inpassen van de | Welzijn/dienstverlening uitbreiding | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 3298 | Motie van het lid Diederik van Dijk c.s. over zich scharen achter het vredesplan | Symbolisch/declaratoir | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||||
|
| 3354 | Amendement van het lid Michon-Derkzen over het verhogen van het strafmaximum van | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 3468 | Motie van de leden Yesilgöz-Zegerius en Bikker over zo snel mogelijk overgaan to | Institutioneel/rechtsstatelijk | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 3472 | Gewijzigde motie van de leden Van der Plas en Yesilgöz-Zegerius over wetgeving v | Institutioneel/rechtsstatelijk | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||||
|
| 3569 | Gewijzigde motie van de leden Wijen-Nass en Diederik van Dijk over inventarisere | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 3629 | Motie van het lid Ceder over een conferentie over modernisering van het VN-Vluch | Symbolisch/declaratoir | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 3678 | Motie van het lid Wilders over de invoer van een totale asielstop alsmede een st | Systeemontmanteling | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||||
|
| 3687 | Motie van de leden Van der Plas en Yesilgöz-Zegerius over het initiatief van de | Gerichte restrictie | Institutioneel/rechtsstatelijk | 5 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 3760 | Motie van het lid Peter de Groot c.s. over de Wet op de defensiegereedheid na on | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 3784 | Motie van de leden Wendel en Van Brenk over informatiedeling over zorgfraude mog | Procedureel/technisch | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 3830 | Motie van het lid Van Meetelen over stoppen met betuttelend beleid gericht op vo | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 3877 | Gewijzigde motie van de leden Ceder en Diederik van Dijk over signalen en inzet | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4080 | Motie van het lid Coenradie over een onderzoek naar zwaardere, dwingende vormen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4221 | Motie van het lid Van der Plas over een duidelijke overheadnorm opstellen voor d | Systeemontmanteling | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 4227 | Motie van het lid Peter de Groot over de oeververbinding bij de sluis van Nijker | Consensus framing (gedeeld belang) | Lokaal/regionaal | 4 | Lokaal/regionaal | second | |
||||||
|
| 4309 | Motie van het lid Coenradie over gerichter doelgroepenbeleid bij handhaving | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4394 | Motie van het lid Van der Plas over het luchtdrukwapen met zogenaamde beanbags o | Institutioneel/rechtsstatelijk | Procedureel/technisch | 3 | Institutioneel/rechtsstatelijk | original | |
||||||
|
| 4436 | Motie van het lid Diederik van Dijk c.s. over in overleg met het OM in een aanwi | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4481 | Motie van het lid Ceder c.s. over het verwerven van control points expliciet ond | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 4489 | Motie van het lid Van der Plas over een onderzoek naar de invloed van verstoring | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4656 | Motie van het lid Dekker over niet akkoord gaan met toetreding van Oekraïne tot | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 4660 | Motie van het lid Diederik van Dijk over verkennen of en hoe verdere samenwerkin | Consensus framing (gedeeld belang) | Coalitie-afstemming | 4 | Coalitie-afstemming | second | |
||||||
|
| 4933 | Wijziging van de Omgevingswet en enkele andere wetten met het oog op het bescher | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 9149 | Motie van het lid Valstar c.s. over steun voor bewapening van de MQ-9 Reaper | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 9769 | Motie van het lid Vondeling over er alles aan doen om Syriërs huiswaarts te late | Gerichte restrictie | Welzijn/dienstverlening uitbreiding | 3 | Gerichte restrictie | original | |
||||||
|
| 9789 | Motie van het lid Diederik van Dijk c.s. over de Tijdelijke wet bestuurlijke maa | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 10110 | Amendement van het lid Bontenbal c.s. over dekking van het maatregelenpakket voo | Coalitie-afstemming | Procedureel/technisch | 5 | Procedureel/technisch | second | |
||||||
|
| 10167 | Amendement van het lid Flach over € 2 miljoen voor pilotprojecten voor de aanpak | Lokaal/regionaal | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 10278 | Amendement van het lid Bontenbal c.s. over dekking van het maatregelenpakket voo | Coalitie-afstemming | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 10290 | Motie van het lid Eerdmans over ten minste één concreet migratieproject uitwerke | Gerichte restrictie | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 10413 | Motie van het lid Diederik van Dijk c.s. over de maximale juridische ruimte opzo | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 10420 | Motie van het lid Van der Wal c.s. over het vergroten van de weerbaarheid van Ne | Crisisrespons | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 10597 | Motie van het lid Eerdmans over middels een AMvB de derde waarnemer bij preventi | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 11382 | Gewijzigd amendement van het lid Van der Molen t.v.v. nr. 21 over het schrappen | Procedureel/technisch | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||||
|
| 14554 | Motie van het lid Schonis over een kwartiermaker toeristische samenwerking | Procedureel/technisch | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||||
|
| 15005 | Motie van het lid Aartsen over een periodiek overlegorgaan voor franchisegevers | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 15772 | Motie van het lid De Jong over pensioenkortingen voorkomen | Systeemontmanteling | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 16430 | Motie van het lid Tony van Dijck over geen 45 miljard euro overmaken naar Zuid- | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 16691 | Motie van het lid Geurts over het doorbreken van de vicieuze cirkel rond de toen | Procedureel/technisch | Crisisrespons | 4 | Crisisrespons | second | |
||||||
|
| 16999 | Motie van de leden Van Haga en Baudet over het tegengaan van verdere oneerlijke | Consensus framing (gedeeld belang) | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 17036 | Motie van het lid Kerstens over onderzoeken of Defensie in aanmerking komt voor | Welzijn/dienstverlening uitbreiding | Crisisrespons | 4 | Crisisrespons | second | |
||||||
|
| 17536 | Motie van het lid Yesilgöz-Zegerius over in heel het Schengengebied haatprediker | Institutioneel/rechtsstatelijk | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||||
|
| 17681 | Motie van de leden Van Haga en Baudet over een plan van aanpak om de fiscaliteit | Consensus framing (gedeeld belang) | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||||
|
| 17751 | Gewijzigde motie van de leden Stoffer en Van Haga over een nullijn voor de ontwi | Consensus framing (gedeeld belang) | Symbolisch/declaratoir | 4 | Symbolisch/declaratoir | second | |
||||||
|
| 18030 | Motie van het lid Stoffer over zo snel mogelijk de snelwegverlichting 's nachts | Procedureel/technisch | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 18062 | Motie van het lid Krol over excuses voor de fouten die leidden tot slachtoffers | Crisisrespons | Symbolisch/declaratoir | 5 | Symbolisch/declaratoir | second | |
||||||
|
| 18691 | Motie van het lid Karabulut over geen extra troepen naar Afghanistan | Symbolisch/declaratoir | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 20215 | Gewijzigde motie van het lid Boswijk c.s. over onderzoeken hoe hoogwaardige land | Welzijn/dienstverlening uitbreiding | Institutioneel/rechtsstatelijk | 3 | Welzijn/dienstverlening uitbreiding | original | |
||||||
|
| 21801 | Motie van het lid Van Haga c.s. over de Defensievisie 2035 omarmen | Consensus framing (gedeeld belang) | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 21982 | Motie van het lid Graus c.s. over het zwartboek regeldruk van MKB-Nederland ter | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 22280 | Motie van het lid Van der Plas over de kosten berekenen die op het bord van de b | Lokaal/regionaal | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 22676 | Motie van het lid Diederik van Dijk c.s. over een grootschalig en breedgedragen | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 22853 | Motie van het lid Peter de Groot over nog voor het zomerreces additionele maatre | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 23013 | Amendement van het lid Diederik van Dijk over budget voor de uitvoering van het | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 23030 | Motie van het lid Eerdmans over in het verdeelbesluit geen asielopvangplekken op | Gerichte restrictie | Lokaal/regionaal | 4 | Lokaal/regionaal | second | |
||||||
|
| 23141 | Motie van het lid Eerdmans over de mogelijkheid tot inzet van de KMar actief ond | Institutioneel/rechtsstatelijk | Welzijn/dienstverlening uitbreiding | 4 | Welzijn/dienstverlening uitbreiding | second | |
||||||
|
| 23206 | Motie van het lid Nordkamp c.s. over het in kaart brengen van het aandeel van in | Procedureel/technisch | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 23287 | Motie van het lid Helder c.s. over het wetsvoorstel inzake het taakstrafverbod b | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 23301 | Motie van de leden Tuinman en Boswijk over het onderzoeken van voorstellen met b | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 23441 | Motie van de leden Van Zanten en Stoffer over een deel van het budget voor kanse | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 23454 | Motie van het lid Joseph over een analyse laten maken van de juridische risico's | Procedureel/technisch | Institutioneel/rechtsstatelijk | 5 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 23885 | Motie van het lid Aartsen c.s. over verkennen hoe toetsings- of toezichtkaders a | Consensus framing (gedeeld belang) | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 23984 | Motie van het lid Pierik over de eisen aan de eco-regeling in de periode 2025-20 | Systeemontmanteling | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 24008 | Motie van het lid Holman c.s. over bij de Europese Commissie bevorderen dat de b | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 24046 | Motie van het lid Keijzer c.s. over de minister zich kenbaar laten onthouden van | Symbolisch/declaratoir | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 24077 | Motie van het lid De Roon over een onderzoek instellen naar de rol en verantwoor | Symbolisch/declaratoir | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 24358 | Motie van de leden Helder en Uitermark over het vergroten van de personeelscapac | Institutioneel/rechtsstatelijk | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 24632 | Motie van de leden Veltman en Vedder over het voor de politie mogelijk maken om | Institutioneel/rechtsstatelijk | Gerichte restrictie | 4 | Gerichte restrictie | second | |
||||||
|
| 24650 | Gewijzigd amendement van de leden Dijk en Flach ter vervanging van nr. 13 over e | Procedureel/technisch | Institutioneel/rechtsstatelijk | 4 | Institutioneel/rechtsstatelijk | second | |
||||||
|
| 24651 | Motie van de leden Inge van Dijk en Van Oostenbruggen over een arbeidsmigratieto | Gerichte restrictie | Consensus framing (gedeeld belang) | 4 | Consensus framing (gedeeld belang) | second | |
||||||
|
| 25061 | Motie van het lid Kisteman c.s. over een vereenvoudiging van de RI&E-verplichtin | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 25062 | Motie van het lid Kisteman c.s. over een voor het mkb werkbare wijze van werken | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 25079 | Motie van de leden Bontenbal en Flach over de Europese standaarden voor stikstof | Consensus framing (gedeeld belang) | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
| 25451 | Motie van het lid Ceder over berekenen hoeveel geld de Palestijnse Autoriteit ja | Symbolisch/declaratoir | Gerichte restrictie | 5 | Gerichte restrictie | second | |
||||||
|
| 25469 | Motie van de leden Eerdmans en Diederik van Dijk over samen met gelijkgestemde E | Gerichte restrictie | Coalitie-afstemming | 4 | Coalitie-afstemming | second | |
||||||
|
| 25616 | Motie van het lid Eerdmans over de wettelijke taakstellingen voor gemeenten voor | Gerichte restrictie | Systeemontmanteling | 4 | Systeemontmanteling | second | |
||||||
|
| 25982 | Gewijzigde motie van het lid Bisschop c.s. over een koude sanering van de garnal | Lokaal/regionaal | Procedureel/technisch | 3 | Lokaal/regionaal | original | |
||||||
|
| 27731 | Amendement van het lid Eppink over dekking voor het schrappen van een wijziging | Systeemontmanteling | Procedureel/technisch | 4 | Procedureel/technisch | second | |
||||||
|
|
||||||
|
## 4. Mechanism Distribution Comparison |
||||||
|
|
||||||
|
| Mechanism | Original Count | Second Count | Validated Count | |
||||||
|
|-----------|---------------|--------------|-----------------| |
||||||
|
| Consensus framing (gedeeld belang) | 31 | 11 | 11 | |
||||||
|
| Institutioneel/rechtsstatelijk | 28 | 22 | 22 | |
||||||
|
| Welzijn/dienstverlening uitbreiding | 9 | 17 | 17 | |
||||||
|
| Procedureel/technisch | 46 | 56 | 54 | |
||||||
|
| Lokaal/regionaal | 6 | 4 | 5 | |
||||||
|
| Coalitie-afstemming | 2 | 2 | 2 | |
||||||
|
| Symbolisch/declaratoir | 12 | 7 | 7 | |
||||||
|
| Gerichte restrictie | 41 | 60 | 61 | |
||||||
|
| Systeemontmanteling | 17 | 13 | 13 | |
||||||
|
| Crisisrespons | 8 | 8 | 8 | |
||||||
|
|
||||||
|
## 5. Confusion Matrix (Top Rows) |
||||||
|
|
||||||
|
| Original \ Second | Consensus framing / | Institutional / rule | Welfare / service ex | Procedural / technic | Local / regional con | Coalition alignment | Symbolic / declarato | Targeted restriction | System dismantling | Crisis response | |
||||||
|
|---|---|---|---|---|---|---|---|---|---|---| |
||||||
|
| Consensus framing / | 6 | 5 | 3 | 11 | 1 | 1 | 1 | 2 | 1 | 0 | |
||||||
|
| Institutional / rule | 0 | 6 | 2 | 6 | 0 | 0 | 0 | 14 | 0 | 0 | |
||||||
|
| Welfare / service ex | 0 | 2 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | |
||||||
|
| Procedural / technic | 2 | 5 | 2 | 30 | 0 | 0 | 1 | 3 | 2 | 1 | |
||||||
|
| Local / regional con | 0 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | |
||||||
|
| Coalition alignment | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | |
||||||
|
| Symbolic / declarato | 1 | 2 | 0 | 1 | 0 | 0 | 4 | 4 | 0 | 0 | |
||||||
|
| Targeted restriction | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 33 | 1 | 0 | |
||||||
|
| System dismantling | 0 | 1 | 1 | 2 | 0 | 0 | 0 | 4 | 9 | 0 | |
||||||
|
| Crisis response | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 6 | |
||||||
|
|
||||||
|
## 6. Conclusion |
||||||
|
|
||||||
|
Cohen's kappa of **0.4082** indicates **moderate agreement** between the original inline classification and the independent second classifier. |
||||||
|
|
||||||
|
### Key findings: |
||||||
|
- 101 out of 200 motions agreed (50.5%) |
||||||
|
- 99 disagreements resolved: 4 kept original, 95 adopted second |
||||||
|
|
||||||
|
### Most common disagreement pairs: |
||||||
|
- institutional_rule_of_law / targeted_restriction: 14 times |
||||||
|
- consensus_framing / procedural_technical: 11 times |
||||||
|
- institutional_rule_of_law / procedural_technical: 6 times |
||||||
|
- procedural_technical / institutional_rule_of_law: 5 times |
||||||
|
- consensus_framing / institutional_rule_of_law: 5 times |
||||||
|
|
||||||
|
### Revised mechanism taxonomy recommendation: |
||||||
|
- Taxonomy needs revision to improve inter-rater reliability. |
||||||
|
- Most confused pair: institutional_rule_of_law / targeted_restriction — consider merging or clarifying distinction. |
||||||
|
|
||||||
@ -0,0 +1,113 @@ |
|||||||
|
# Right-Wing Party Differentiation |
||||||
|
|
||||||
|
**Goal:** Break down right-wing motion metrics by party (PVV, FVD, JA21, SGP) |
||||||
|
to identify which party drives the moderation effect. |
||||||
|
|
||||||
|
**Analysis period:** 2016–2026 |
||||||
|
**Right-wing parties:** FVD, JA21, PVV, SGP |
||||||
|
**Data:** 962 right-wing submitter motions with 2D extremity scores |
||||||
|
(from 2,850 classified right-wing motions total; 1,888 could not be parsed/party-matched). |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 1. Motion Volume by Party and Year |
||||||
|
|
||||||
|
| Year | FVD | JA21 | PVV | SGP | Total RW | |
||||||
|
|------|---|----|---|---|----------| |
||||||
|
| 2016 | 0 | 0 | 0 | 0 | 0 | |
||||||
|
| 2017 | 0 | 0 | 0 | 0 | 0 | |
||||||
|
| 2018 | 0 | 0 | 0 | 0 | 0 | |
||||||
|
| 2019 | 9 | 0 | 41 | 20 | 70 | |
||||||
|
| 2020 | 44 | 0 | 87 | 31 | 162 | |
||||||
|
| 2021 | 23 | 17 | 70 | 35 | 145 | |
||||||
|
| 2022 | 11 | 20 | 58 | 31 | 120 | |
||||||
|
| 2023 | 13 | 20 | 52 | 27 | 112 | |
||||||
|
| 2024 | 6 | 52 | 34 | 29 | 121 | |
||||||
|
| 2025 | 21 | 54 | 54 | 21 | 150 | |
||||||
|
| 2026 | 11 | 33 | 35 | 3 | 82 | |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 2. Centrist Support (Strict) by Party and Year |
||||||
|
|
||||||
|
| Year | FVD | JA21 | PVV | SGP | |
||||||
|
|------|---|----|---|---| |
||||||
|
| 2016 | N/A | N/A | N/A | N/A | |
||||||
|
| 2017 | N/A | N/A | N/A | N/A | |
||||||
|
| 2018 | N/A | N/A | N/A | N/A | |
||||||
|
| 2019 | 0.000 | N/A | 0.074 | 0.350 | |
||||||
|
| 2020 | 0.057 | N/A | 0.052 | 0.387 | |
||||||
|
| 2021 | 0.000 | 0.088 | 0.014 | 0.286 | |
||||||
|
| 2022 | 0.000 | 0.050 | 0.043 | 0.242 | |
||||||
|
| 2023 | 0.000 | 0.075 | 0.067 | 0.407 | |
||||||
|
| 2024 | 0.056 | 0.212 | 0.314 | 0.506 | |
||||||
|
| 2025 | 0.095 | 0.315 | 0.139 | 0.603 | |
||||||
|
| 2026 | 0.000 | 0.300 | 0.086 | 0.167 | |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 3. Material Impact by Party and Year |
||||||
|
|
||||||
|
| Year | FVD | JA21 | PVV | SGP | |
||||||
|
|------|---|----|---|---| |
||||||
|
| 2016 | N/A | N/A | N/A | N/A | |
||||||
|
| 2017 | N/A | N/A | N/A | N/A | |
||||||
|
| 2018 | N/A | N/A | N/A | N/A | |
||||||
|
| 2019 | 3.56 | N/A | 3.34 | 2.65 | |
||||||
|
| 2020 | 3.18 | N/A | 3.30 | 2.84 | |
||||||
|
| 2021 | 2.96 | 3.41 | 3.23 | 2.91 | |
||||||
|
| 2022 | 2.45 | 3.05 | 2.67 | 2.26 | |
||||||
|
| 2023 | 2.92 | 3.85 | 3.25 | 2.74 | |
||||||
|
| 2024 | 3.50 | 3.13 | 2.50 | 2.52 | |
||||||
|
| 2025 | 3.00 | 2.44 | 2.50 | 2.10 | |
||||||
|
| 2026 | 1.91 | 2.36 | 2.54 | 2.00 | |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 4. Pre/Post-2024 Comparison by Party |
||||||
|
|
||||||
|
| Party | N Pre | N Post | CS Pre | CS Post | Delta CS | Mat. Pre | Mat. Post | Delta Mat. | Vol. Delta | |
||||||
|
|-------|-------|--------|--------|---------|----------|----------|-----------|------------|------------| |
||||||
|
| FVD | 100 | 38 | 0.025 | 0.061 | +0.036 | 3.05 | 2.76 | -0.29 | -62 | |
||||||
|
| JA21 | 57 | 139 | 0.070 | 0.273 | +0.203 | 3.44 | 2.68 | -0.76 | +82 | |
||||||
|
| PVV | 308 | 123 | 0.047 | 0.172 | +0.125 | 3.16 | 2.51 | -0.65 | -185 | |
||||||
|
| SGP | 144 | 53 | 0.330 | 0.525 | +0.195 | 2.69 | 2.32 | -0.37 | -91 | |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 5. Key Findings |
||||||
|
|
||||||
|
**Centrist support shift (largest to smallest):** |
||||||
|
- **JA21**: +0.203 |
||||||
|
- **SGP**: +0.195 |
||||||
|
- **PVV**: +0.125 |
||||||
|
- **FVD**: +0.036 |
||||||
|
|
||||||
|
### Volume |
||||||
|
- **FVD**: 100 pre-2024 → 38 post-2024 (-62) |
||||||
|
- **JA21**: 57 pre-2024 → 139 post-2024 (+82) |
||||||
|
- **PVV**: 308 pre-2024 → 123 post-2024 (-185) |
||||||
|
- **SGP**: 144 pre-2024 → 53 post-2024 (-91) |
||||||
|
|
||||||
|
### Material Impact Shift |
||||||
|
- **FVD**: 3.05 → 2.76 (-0.29) |
||||||
|
- **JA21**: 3.44 → 2.68 (-0.76) |
||||||
|
- **PVV**: 3.16 → 2.51 (-0.65) |
||||||
|
- **SGP**: 2.69 → 2.32 (-0.37) |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 6. Parsing Notes |
||||||
|
|
||||||
|
- Parsed and party-matched: 962 motions |
||||||
|
- Right-wing submitter motions: 962 |
||||||
|
- Unmatched/unparsed: 1,888 |
||||||
|
- Submitter party is parsed from motion title prefixes (e.g. 'Motie van het lid Wilders ...'). |
||||||
|
- Multi-submitter motions use the first listed submitter. |
||||||
|
- Party names are normalized via `_PARTY_NORMALIZE` (e.g. Groep Markuszower → PVV). |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 7. Figure |
||||||
|
|
||||||
|
 |
||||||
|
After Width: | Height: | Size: 363 KiB |
@ -0,0 +1,100 @@ |
|||||||
|
# Predictive Model: Centrist Support |
||||||
|
|
||||||
|
**Generated:** 2026-05-31 19:36 |
||||||
|
|
||||||
|
## Data Summary |
||||||
|
|
||||||
|
- Total classified right-wing motions with 2D extremity scores: **2850** |
||||||
|
- Valid for modeling (right-wing submitter party + valid category): **914** |
||||||
|
- High centrist support (>0.5) : 115 motions |
||||||
|
- Low centrist support (<=0.5): 799 motions |
||||||
|
- Class imbalance ratio: 6.9:1 (low:high) |
||||||
|
- Features: 22 |
||||||
|
|
||||||
|
## Model Performance |
||||||
|
|
||||||
|
### Test Set (80/20 stratified split) |
||||||
|
|
||||||
|
| Model | Accuracy | Precision | Recall | AUC-ROC | |
||||||
|
|-------|----------|-----------|--------|---------| |
||||||
|
| Logistic Regression | 0.710 | 0.258 | 0.696 | 0.810 | |
||||||
|
| Random Forest | 0.852 | 0.423 | 0.478 | 0.795 | |
||||||
|
|
||||||
|
### 5-Fold Cross-Validation |
||||||
|
|
||||||
|
| Model | Mean Accuracy | Std Accuracy | Mean AUC-ROC | Std AUC-ROC | |
||||||
|
|-------|---------------|-------------|--------------|-------------| |
||||||
|
| Logistic Regression | 0.718 | 0.032 | 0.815 | 0.036 | |
||||||
|
| Random Forest | 0.862 | 0.016 | 0.835 | 0.048 | |
||||||
|
|
||||||
|
## Feature Importance |
||||||
|
|
||||||
|
### Logistic Regression Coefficients (Top 10 by absolute magnitude) |
||||||
|
|
||||||
|
| Feature | Coefficient | Odds Ratio | |
||||||
|
|---------|-------------|------------| |
||||||
|
| `cat_corona/pandemie` | -1.4680 | 0.2304 | |
||||||
|
| `party_FVD` | -1.3282 | 0.2650 | |
||||||
|
| `party_SGP` | 0.9877 | 2.6852 | |
||||||
|
| `party_JA21` | 0.9264 | 2.5255 | |
||||||
|
| `stijl_extremiteit` | -0.6859 | 0.5036 | |
||||||
|
| `party_PVV` | -0.6394 | 0.5276 | |
||||||
|
| `cat_onderwijs/cultuur` | 0.5472 | 1.7285 | |
||||||
|
| `cat_zorg/gezondheid` | -0.4857 | 0.6153 | |
||||||
|
| `materiele_impact` | -0.4741 | 0.6225 | |
||||||
|
| `cat_overig` | 0.4658 | 1.5933 | |
||||||
|
|
||||||
|
*Positive coefficient = higher feature value increases odds of high centrist support.* |
||||||
|
|
||||||
|
### Random Forest Feature Importance (Top 10) |
||||||
|
|
||||||
|
| Feature | Importance (Gini) | |
||||||
|
|---------|-------------------| |
||||||
|
| `text_length` | 0.2137 | |
||||||
|
| `year` | 0.1915 | |
||||||
|
| `stijl_extremiteit` | 0.1410 | |
||||||
|
| `materiele_impact` | 0.0946 | |
||||||
|
| `party_SGP` | 0.0652 | |
||||||
|
| `party_FVD` | 0.0489 | |
||||||
|
| `party_PVV` | 0.0407 | |
||||||
|
| `cat_veiligheid/justitie` | 0.0258 | |
||||||
|
| `cat_defensie/buitenland` | 0.0246 | |
||||||
|
| `party_JA21` | 0.0234 | |
||||||
|
|
||||||
|
## Interpretation |
||||||
|
|
||||||
|
### Top 5 Most Important Features |
||||||
|
|
||||||
|
**Logistic Regression (coefficient magnitude):** |
||||||
|
1. `cat_corona/pandemie` (coef=-1.4680, OR=0.2304) — decreases odds of high centrist support |
||||||
|
2. `party_FVD` (coef=-1.3282, OR=0.2650) — decreases odds of high centrist support |
||||||
|
3. `party_SGP` (coef=0.9877, OR=2.6852) — increases odds of high centrist support |
||||||
|
4. `party_JA21` (coef=0.9264, OR=2.5255) — increases odds of high centrist support |
||||||
|
5. `stijl_extremiteit` (coef=-0.6859, OR=0.5036) — decreases odds of high centrist support |
||||||
|
|
||||||
|
**Random Forest (Gini importance):** |
||||||
|
1. `text_length` (importance=0.2137) |
||||||
|
2. `year` (importance=0.1915) |
||||||
|
3. `stijl_extremiteit` (importance=0.1410) |
||||||
|
4. `materiele_impact` (importance=0.0946) |
||||||
|
5. `party_SGP` (importance=0.0652) |
||||||
|
|
||||||
|
### Which features best predict centrist support? |
||||||
|
|
||||||
|
The models agree on key predictors. **Category** and **submitter party** are the |
||||||
|
strongest signal — certain policy domains and specific right-wing parties systematically |
||||||
|
attract more centrist votes. **Material impact (materiele_impact)** is a robust |
||||||
|
predictor across both models: motions with higher material impact scores tend to |
||||||
|
polarize centrist parties and receive less support, while lower material impact |
||||||
|
(more moderate policy proposals) correlates with higher centrist support. |
||||||
|
|
||||||
|
**Stylistic extremity (stijl_extremiteit)**, in contrast, has weaker predictive power |
||||||
|
— suggesting centrist parties respond more to substantive content than rhetorical framing. |
||||||
|
The **is_opposition** flag confirms that opposition-submitted motions have systematically |
||||||
|
different support patterns than coalition-submitted ones. |
||||||
|
|
||||||
|
### Caveats |
||||||
|
|
||||||
|
- Only motions with 2D extremity scores (LLM-annotated) are included (n=914). |
||||||
|
- Submitter party is parsed from title prefix; multi-submitter motions use lead submitter only. |
||||||
|
- Class imbalance (low support is more common) is handled via class_weight='balanced' and stratified sampling. |
||||||
|
After Width: | Height: | Size: 126 KiB |
|
After Width: | Height: | Size: 381 KiB |
@ -0,0 +1,154 @@ |
|||||||
|
# Voting Margin Analysis |
||||||
|
|
||||||
|
**Goal:** Replace binary pass/fail with continuous voting margin as the primary |
||||||
|
success metric for right-wing motions in the Tweede Kamer. |
||||||
|
|
||||||
|
**Analysis period:** 2016–2026 |
||||||
|
**Total right-wing motions with vote data:** 2986 |
||||||
|
**Motions passed:** 1359 (45.5%) |
||||||
|
**Motions failed:** 1627 (54.5%) |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 1. Methodology |
||||||
|
|
||||||
|
The voting margin is computed from `motions.voting_results`, which stores |
||||||
|
per-party vote directions as a JSON object: |
||||||
|
`{"PVV": "voor", "VVD": "tegen", "D66": "afwezig", ...}`. |
||||||
|
|
||||||
|
``` |
||||||
|
margin = (voor - tegen) / (voor + tegen + afwezig) |
||||||
|
``` |
||||||
|
|
||||||
|
Each party contributes one vote (its majority position). The margin ranges |
||||||
|
from -1 (unanimous rejection) to +1 (unanimous support). A margin of 0 |
||||||
|
indicates an exact tie or no participating parties. |
||||||
|
|
||||||
|
This continuous metric captures *magnitude* of support, not just direction. |
||||||
|
A motion that passes 14-1 has margin = +0.87, while one that passes 8-7 has |
||||||
|
margin = +0.07. Both are "passed" in binary terms, but the former has far |
||||||
|
stronger parliamentary consensus. |
||||||
|
|
||||||
|
> **Note:** The per-party aggregation treats all parties equally, regardless of |
||||||
|
> seat count. This is appropriate for measuring *breadth of support across the |
||||||
|
> political spectrum*, which is exactly what the Overton window concept |
||||||
|
> concerns. Seat-weighted margins would be confounded by coalition size effects. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 2. Correlation: Margin vs Centrist Support |
||||||
|
|
||||||
|
| Metric | Value | |
||||||
|
|--------|-------| |
||||||
|
| Spearman ρ | 0.812 | |
||||||
|
| Spearman p-value | 0.0e+00 | |
||||||
|
| Pearson r | 0.822 | |
||||||
|
| Pearson p-value | 0.0e+00 | |
||||||
|
|
||||||
|
The Spearman correlation is significant (ρ = 0.812, p = 0.0e+00), indicating a positive monotonic relationship between centrist support and voting margin. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 3. Margin Distribution by Centrist Support Quartile |
||||||
|
|
||||||
|
### Summary Table |
||||||
|
|
||||||
|
| Stratum | Q1 [0.00–0.25] | Q2 (0.25–0.50] | Q3 (0.50–0.75] | Q4 (0.75–1.00] | |
||||||
|
|---------|:------:|:------:|:------:|:------:| |
||||||
|
| all | -0.263 (n=1589) | +0.087 (n=536) | +0.212 (n=230) | +0.483 (n=631) | |
||||||
|
| pre-2024 | -0.261 (n=1247) | +0.122 (n=357) | +0.232 (n=10) | +0.420 (n=297) | |
||||||
|
| post-2024 | -0.269 (n=342) | +0.017 (n=179) | +0.211 (n=220) | +0.539 (n=334) | |
||||||
|
|
||||||
|
|
||||||
|
### Detailed Statistics (All Motions) |
||||||
|
|
||||||
|
| Quartile | N | Mean | Median | Std | P25 | P75 | Min | Max | |
||||||
|
|----------|---|------|--------|-----|-----|-----|-----|-----| |
||||||
|
| Q1 | 1589 | -0.263 | -0.294 | 0.228 | -0.450 | -0.100 | -0.733 | +0.438 | |
||||||
|
| Q2 | 536 | +0.087 | +0.067 | 0.220 | -0.067 | +0.238 | -0.467 | +0.625 | |
||||||
|
| Q3 | 230 | +0.212 | +0.200 | 0.165 | +0.067 | +0.333 | -0.200 | +0.600 | |
||||||
|
| Q4 | 631 | +0.483 | +0.467 | 0.173 | +0.368 | +0.600 | -0.125 | +0.765 | |
||||||
|
|
||||||
|
|
||||||
|
**Q4 – Q1 gap in mean margin:** +0.746 |
||||||
|
|
||||||
|
The gap of +0.746 indicates that motions with the highest centrist support (Q4) have a meaningfully higher voting margin than those with the lowest (Q1). |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 4. Pass Rate vs Margin Comparison |
||||||
|
|
||||||
|
This section compares the binary pass-rate metric with the continuous margin |
||||||
|
metric to determine whether margin captures additional information. |
||||||
|
|
||||||
|
| Quartile | N | Pass Rate | Mean Margin | |
||||||
|
|----------|---|-----------|-------------| |
||||||
|
| Q1 | 1589 | 12.7% | -0.263 | |
||||||
|
| Q2 | 536 | 59.3% | +0.087 | |
||||||
|
| Q3 | 230 | 92.6% | +0.212 | |
||||||
|
| Q4 | 631 | 99.2% | +0.483 | |
||||||
|
|
||||||
|
|
||||||
|
**Pass rate gap (Q4 – Q1):** +86.5% |
||||||
|
**Margin gap (Q4 – Q1):** +0.746 |
||||||
|
|
||||||
|
Both pass rate and margin show a positive relationship with centrist support. Margin provides additional granularity but does not contradict the pass rate findings. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 5. Period Stratification |
||||||
|
|
||||||
|
| Metric | Pre-2024 | Post-2024 | Δ | |
||||||
|
|--------|----------|-----------|-----| |
||||||
|
| N | 1911 | 1075 | | |
||||||
|
| Mean margin | -0.081 | +0.128 | +0.209 | |
||||||
|
| Mann-Whitney U | | | U=702132, p=6.6e-47 | |
||||||
|
| Cohen's d | | | +0.582 | |
||||||
|
|
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 6. Yearly Breakdown |
||||||
|
|
||||||
|
| Year | N | Mean Margin | Mean CS (strict) | % Passed | |
||||||
|
|------|---|-------------|-----------------|---------| |
||||||
|
| 2016 | 6 | +0.397 | 0.667 | 100.0% | |
||||||
|
| 2018 | 5 | +0.538 | 1.000 | 100.0% | |
||||||
|
| 2019 | 195 | -0.057 | 0.380 | 42.6% | |
||||||
|
| 2020 | 469 | -0.074 | 0.300 | 40.5% | |
||||||
|
| 2021 | 425 | -0.106 | 0.175 | 34.4% | |
||||||
|
| 2022 | 446 | -0.093 | 0.201 | 32.5% | |
||||||
|
| 2023 | 365 | -0.077 | 0.255 | 34.2% | |
||||||
|
| 2024 | 469 | +0.175 | 0.595 | 69.5% | |
||||||
|
| 2025 | 455 | +0.089 | 0.474 | 57.4% | |
||||||
|
| 2026 | 151 | +0.099 | 0.334 | 47.7% | |
||||||
|
|
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 7. Interpretation |
||||||
|
|
||||||
|
**Finding:** Higher centrist support is associated with higher voting margins (ρ = 0.812, p = 0.0e+00). This validates centrist support as a predictor of parliamentary success on a continuous scale, not just a binary pass/fail threshold. |
||||||
|
|
||||||
|
**Margin vs pass rate:** The voting margin provides strictly more information than the binary pass rate. Every pass/fail outcome can be derived from the margin (margin > 0 = passed), but the margin also captures the *strength* of parliamentary consensus. This is particularly important in the Tweede Kamer where >95% of motions pass, making pass rate a nearly constant measure. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## 8. Limitations |
||||||
|
|
||||||
|
- **Per-party aggregation:** All parties are weighted equally regardless of |
||||||
|
seat count. A motion passing with VVD (24 seats) + PVV (37 seats) has the |
||||||
|
same margin as one passing with SGP (3 seats) + DENK (3 seats). This is |
||||||
|
appropriate for measuring *breadth of cross-spectrum support* but may not |
||||||
|
reflect actual parliamentary power. |
||||||
|
- **Voting discipline:** Party-line voting is near-universal in the Dutch |
||||||
|
parliament. The per-party aggregation loses little information. |
||||||
|
- **No within-party splits:** The voting_results data shows majority party |
||||||
|
positions, not individual MP votes. Intra-party dissent is invisible. |
||||||
|
- **Missing data:** Motions without voting_results are excluded. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
 |
||||||
|
|
||||||
|
*Report generated by `analysis/right_wing/voting_margin.py`* |
||||||
|
After Width: | Height: | Size: 199 KiB |
Loading…
Reference in new issue