motief/docs/research/llm-motion-classification.md

# Motion Extremity Classification with LLMs

## Implementation Status

**Script**: `scripts/classify_motions.py` - Ready to run

**Requirements**:
- Valid OpenRouter API key in `.env` (current key returns "User not found")
- ~28,000 motions to classify

**Usage**:
```bash
# Classify all motions (will take hours)
.venv/bin/python scripts/classify_motions.py --delay 0.5

# Test with small sample first
.venv/bin/python scripts/classify_motions.py --limit 10 --delay 2

# Analyze existing classifications
.venv/bin/python scripts/classify_motions.py --analyze-only
```

## Why LLMs?

Rule-based keyword matching is too crude:
- Only captures 3-4% as "high extremity"
- Can't understand nuance ("verbod" appears in mundane contexts)
- Can't assess policy impact magnitude

LLMs can:
- Understand policy context and implications
- Assess deviation from consensus/norms
- Interpret Dutch political terminology

## Proposed LLM Classification Schema

### Output Format
```json
{
  "extremity_score": 1-5,
  "policy_domain": "migration|identity|economy|social|climate|foreign_policy|justice|education|health|other",
  "policy_direction": "restrictive|permissive|neutral",
  "deviation_type": "procedural|semantic|structural",
  "consensus_level": "broad|partial|narrow|opposition",
  "rationale": "1-2 sentence explanation"
}
```

### Extremity Scale (1-5)

| Score | Label | Description | Examples |
|-------|-------|-------------|----------|
| 1 | Mainstream | Standard governance, routine | Budget adjustments, procedural changes |
| 2 | Minor deviation | Small policy tweaks within consensus | Minor fee changes, small program adjustments |
| 3 | Moderate deviation | Meaningful but within coalition consensus | Immigration processing changes, targeted regulations |
| 4 | Major deviation | Challenges status quo meaningfully | Tighter migration rules, significant policy reversals |
| 5 | Extreme | Fundamental/populist, outside consensus | Complete bans, anti-democratic motions |

### Policy Direction

- **restrictive**: Limits freedoms, tightens rules, reduces access
- **permissive**: Expands freedoms, loosens rules, increases access
- **neutral**: Procedural, administrative, technical

### Consensus Level

- **broad**: Passed with 80%+ parties voting same way
- **partial**: Passed with 60-80% agreement
- **narrow**: Passed with 50-60% (close vote)
- **opposition**: Coalition parties voted against

## LLM Prompt

```
SYSTEM:
You are an expert on Dutch parliamentary politics. Classify parliamentary motions
on policy extremity using the provided schema.

CLASSIFICATION_RUBRIC:
- Score 1 (Mainstream): Routine governance, budget adjustments, procedural changes
- Score 2 (Minor): Small policy tweaks within consensus
- Score 3 (Moderate): Meaningful changes but within coalition consensus
- Score 4 (Major): Challenges status quo, significant policy shifts
- Score 5 (Extreme): Fundamental changes, populist, outside consensus

Consider:
- Policy impact magnitude
- Deviation from current norms/policies
- Coalition/opposition dynamics
- Dutch political context

USER:
Classify this motion:

Title: {title}
Description: {description}
Voting result: {passed/rejected}, {party_coalition} parties voted for

Respond in JSON format.
```

## Batch Processing Strategy

```python
import json
import asyncio
from openai import AsyncOpenAI

async def classify_motion_batch(motions: list[dict], model: str = "gpt-4o") -> list[dict]:
    """Process motions in parallel batches."""

    client = AsyncOpenAI()

    async def classify_one(motion: dict) -> dict:
        prompt = build_prompt(motion)

        response = await client.chat.completions.create(
            model=model,
            messages=[{"role": "system", "content": SYSTEM_PROMPT},
                    {"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )

        result = json.loads(response.choices[0].message.content)
        result["motion_id"] = motion["id"]
        return result

    # Process 50 in parallel
    results = []
    for i in range(0, len(motions), 50):
        batch = motions[i:i+50]
        batch_results = await asyncio.gather(*[classify_one(m) for m in batch])
        results.extend(batch_results)

    return results

async def main():
    motions = load_motions()  # Load from database
    classifications = await classify_motion_batch(motions)
    save_to_database(classifications)

asyncio.run(main())
```

## Cost Estimate

| Dataset Size | Model | Est. Cost | Est. Time |
|-------------|-------|-----------|-----------|
| 35,000 motions | gpt-4o-mini | ~$5-10 | 30-60 min |
| 35,000 motions | gpt-4o | ~$50-100 | 2-4 hours |

Using `gpt-4o-mini` is sufficient for classification tasks.

## Analysis After Classification

Once classified, we can analyze:

```python
# Extremity by period
df.groupby(['period', 'extremity_score']).size().unstack(fill_value=0)

# Domain-Extremity heatmap
pivot = df.pivot_table(values='motion_id',
                        index='policy_domain',
                        columns='extremity_score',
                        aggfunc='count')

# Passed vs rejected extremity
df.groupby('passed')['extremity_score'].mean()

# Coalition shift analysis
df[df['policy_domain'] == 'migration'].groupby(['period', 'policy_direction']).size()
```

## Expected Insights

1. **Extremity distribution over time** - Has 4-5 score increased?
2. **Domain-extremity correlation** - Which domains produce extreme policies?
3. **Direction-extremity** - Restrictive vs permissive extremity by period
4. **Consensus-extremity** - Are extreme policies passing with broad or narrow consensus?
5. **Coalition voting** - Which parties support extreme policies?