You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/research/llm-motion-classification.md

181 lines
5.6 KiB

# Motion Extremity Classification with LLMs
## Implementation Status
**Script**: `scripts/classify_motions.py` - Ready to run
**Requirements**:
- Valid OpenRouter API key in `.env` (current key returns "User not found")
- ~28,000 motions to classify
**Usage**:
```bash
# Classify all motions (will take hours)
.venv/bin/python scripts/classify_motions.py --delay 0.5
# Test with small sample first
.venv/bin/python scripts/classify_motions.py --limit 10 --delay 2
# Analyze existing classifications
.venv/bin/python scripts/classify_motions.py --analyze-only
```
## Why LLMs?
Rule-based keyword matching is too crude:
- Only captures 3-4% as "high extremity"
- Can't understand nuance ("verbod" appears in mundane contexts)
- Can't assess policy impact magnitude
LLMs can:
- Understand policy context and implications
- Assess deviation from consensus/norms
- Interpret Dutch political terminology
## Proposed LLM Classification Schema
### Output Format
```json
{
"extremity_score": 1-5,
"policy_domain": "migration|identity|economy|social|climate|foreign_policy|justice|education|health|other",
"policy_direction": "restrictive|permissive|neutral",
"deviation_type": "procedural|semantic|structural",
"consensus_level": "broad|partial|narrow|opposition",
"rationale": "1-2 sentence explanation"
}
```
### Extremity Scale (1-5)
| Score | Label | Description | Examples |
|-------|-------|-------------|----------|
| 1 | Mainstream | Standard governance, routine | Budget adjustments, procedural changes |
| 2 | Minor deviation | Small policy tweaks within consensus | Minor fee changes, small program adjustments |
| 3 | Moderate deviation | Meaningful but within coalition consensus | Immigration processing changes, targeted regulations |
| 4 | Major deviation | Challenges status quo meaningfully | Tighter migration rules, significant policy reversals |
| 5 | Extreme | Fundamental/populist, outside consensus | Complete bans, anti-democratic motions |
### Policy Direction
- **restrictive**: Limits freedoms, tightens rules, reduces access
- **permissive**: Expands freedoms, loosens rules, increases access
- **neutral**: Procedural, administrative, technical
### Consensus Level
- **broad**: Passed with 80%+ parties voting same way
- **partial**: Passed with 60-80% agreement
- **narrow**: Passed with 50-60% (close vote)
- **opposition**: Coalition parties voted against
## LLM Prompt
```
SYSTEM:
You are an expert on Dutch parliamentary politics. Classify parliamentary motions
on policy extremity using the provided schema.
CLASSIFICATION_RUBRIC:
- Score 1 (Mainstream): Routine governance, budget adjustments, procedural changes
- Score 2 (Minor): Small policy tweaks within consensus
- Score 3 (Moderate): Meaningful changes but within coalition consensus
- Score 4 (Major): Challenges status quo, significant policy shifts
- Score 5 (Extreme): Fundamental changes, populist, outside consensus
Consider:
- Policy impact magnitude
- Deviation from current norms/policies
- Coalition/opposition dynamics
- Dutch political context
USER:
Classify this motion:
Title: {title}
Description: {description}
Voting result: {passed/rejected}, {party_coalition} parties voted for
Respond in JSON format.
```
## Batch Processing Strategy
```python
import json
import asyncio
from openai import AsyncOpenAI
async def classify_motion_batch(motions: list[dict], model: str = "gpt-4o") -> list[dict]:
"""Process motions in parallel batches."""
client = AsyncOpenAI()
async def classify_one(motion: dict) -> dict:
prompt = build_prompt(motion)
response = await client.chat.completions.create(
model=model,
messages=[{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
result["motion_id"] = motion["id"]
return result
# Process 50 in parallel
results = []
for i in range(0, len(motions), 50):
batch = motions[i:i+50]
batch_results = await asyncio.gather(*[classify_one(m) for m in batch])
results.extend(batch_results)
return results
async def main():
motions = load_motions() # Load from database
classifications = await classify_motion_batch(motions)
save_to_database(classifications)
asyncio.run(main())
```
## Cost Estimate
| Dataset Size | Model | Est. Cost | Est. Time |
|-------------|-------|-----------|-----------|
| 35,000 motions | gpt-4o-mini | ~$5-10 | 30-60 min |
| 35,000 motions | gpt-4o | ~$50-100 | 2-4 hours |
Using `gpt-4o-mini` is sufficient for classification tasks.
## Analysis After Classification
Once classified, we can analyze:
```python
# Extremity by period
df.groupby(['period', 'extremity_score']).size().unstack(fill_value=0)
# Domain-Extremity heatmap
pivot = df.pivot_table(values='motion_id',
index='policy_domain',
columns='extremity_score',
aggfunc='count')
# Passed vs rejected extremity
df.groupby('passed')['extremity_score'].mean()
# Coalition shift analysis
df[df['policy_domain'] == 'migration'].groupby(['period', 'policy_direction']).size()
```
## Expected Insights
1. **Extremity distribution over time** - Has 4-5 score increased?
2. **Domain-extremity correlation** - Which domains produce extreme policies?
3. **Direction-extremity** - Restrictive vs permissive extremity by period
4. **Consensus-extremity** - Are extreme policies passing with broad or narrow consensus?
5. **Coalition voting** - Which parties support extreme policies?