You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/research/llm-motion-classification.md

5.6 KiB

Motion Extremity Classification with LLMs

Implementation Status

Script: scripts/classify_motions.py - Ready to run

Requirements:

  • Valid OpenRouter API key in .env (current key returns "User not found")
  • ~28,000 motions to classify

Usage:

# Classify all motions (will take hours)
.venv/bin/python scripts/classify_motions.py --delay 0.5

# Test with small sample first
.venv/bin/python scripts/classify_motions.py --limit 10 --delay 2

# Analyze existing classifications
.venv/bin/python scripts/classify_motions.py --analyze-only

Why LLMs?

Rule-based keyword matching is too crude:

  • Only captures 3-4% as "high extremity"
  • Can't understand nuance ("verbod" appears in mundane contexts)
  • Can't assess policy impact magnitude

LLMs can:

  • Understand policy context and implications
  • Assess deviation from consensus/norms
  • Interpret Dutch political terminology

Proposed LLM Classification Schema

Output Format

{
  "extremity_score": 1-5,
  "policy_domain": "migration|identity|economy|social|climate|foreign_policy|justice|education|health|other",
  "policy_direction": "restrictive|permissive|neutral",
  "deviation_type": "procedural|semantic|structural",
  "consensus_level": "broad|partial|narrow|opposition",
  "rationale": "1-2 sentence explanation"
}

Extremity Scale (1-5)

Score Label Description Examples
1 Mainstream Standard governance, routine Budget adjustments, procedural changes
2 Minor deviation Small policy tweaks within consensus Minor fee changes, small program adjustments
3 Moderate deviation Meaningful but within coalition consensus Immigration processing changes, targeted regulations
4 Major deviation Challenges status quo meaningfully Tighter migration rules, significant policy reversals
5 Extreme Fundamental/populist, outside consensus Complete bans, anti-democratic motions

Policy Direction

  • restrictive: Limits freedoms, tightens rules, reduces access
  • permissive: Expands freedoms, loosens rules, increases access
  • neutral: Procedural, administrative, technical

Consensus Level

  • broad: Passed with 80%+ parties voting same way
  • partial: Passed with 60-80% agreement
  • narrow: Passed with 50-60% (close vote)
  • opposition: Coalition parties voted against

LLM Prompt

SYSTEM:
You are an expert on Dutch parliamentary politics. Classify parliamentary motions 
on policy extremity using the provided schema.

CLASSIFICATION_RUBRIC:
- Score 1 (Mainstream): Routine governance, budget adjustments, procedural changes
- Score 2 (Minor): Small policy tweaks within consensus
- Score 3 (Moderate): Meaningful changes but within coalition consensus
- Score 4 (Major): Challenges status quo, significant policy shifts
- Score 5 (Extreme): Fundamental changes, populist, outside consensus

Consider:
- Policy impact magnitude
- Deviation from current norms/policies
- Coalition/opposition dynamics
- Dutch political context

USER:
Classify this motion:

Title: {title}
Description: {description}
Voting result: {passed/rejected}, {party_coalition} parties voted for

Respond in JSON format.

Batch Processing Strategy

import json
import asyncio
from openai import AsyncOpenAI

async def classify_motion_batch(motions: list[dict], model: str = "gpt-4o") -> list[dict]:
    """Process motions in parallel batches."""
    
    client = AsyncOpenAI()
    
    async def classify_one(motion: dict) -> dict:
        prompt = build_prompt(motion)
        
        response = await client.chat.completions.create(
            model=model,
            messages=[{"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        result["motion_id"] = motion["id"]
        return result
    
    # Process 50 in parallel
    results = []
    for i in range(0, len(motions), 50):
        batch = motions[i:i+50]
        batch_results = await asyncio.gather(*[classify_one(m) for m in batch])
        results.extend(batch_results)
    
    return results

async def main():
    motions = load_motions()  # Load from database
    classifications = await classify_motion_batch(motions)
    save_to_database(classifications)

asyncio.run(main())

Cost Estimate

Dataset Size Model Est. Cost Est. Time
35,000 motions gpt-4o-mini ~$5-10 30-60 min
35,000 motions gpt-4o ~$50-100 2-4 hours

Using gpt-4o-mini is sufficient for classification tasks.

Analysis After Classification

Once classified, we can analyze:

# Extremity by period
df.groupby(['period', 'extremity_score']).size().unstack(fill_value=0)

# Domain-Extremity heatmap
pivot = df.pivot_table(values='motion_id', 
                        index='policy_domain', 
                        columns='extremity_score', 
                        aggfunc='count')

# Passed vs rejected extremity
df.groupby('passed')['extremity_score'].mean()

# Coalition shift analysis
df[df['policy_domain'] == 'migration'].groupby(['period', 'policy_direction']).size()

Expected Insights

  1. Extremity distribution over time - Has 4-5 score increased?
  2. Domain-extremity correlation - Which domains produce extreme policies?
  3. Direction-extremity - Restrictive vs permissive extremity by period
  4. Consensus-extremity - Are extreme policies passing with broad or narrow consensus?
  5. Coalition voting - Which parties support extreme policies?