You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
181 lines
5.6 KiB
181 lines
5.6 KiB
# Motion Extremity Classification with LLMs
|
|
|
|
## Implementation Status
|
|
|
|
**Script**: `scripts/classify_motions.py` - Ready to run
|
|
|
|
**Requirements**:
|
|
- Valid OpenRouter API key in `.env` (current key returns "User not found")
|
|
- ~28,000 motions to classify
|
|
|
|
**Usage**:
|
|
```bash
|
|
# Classify all motions (will take hours)
|
|
.venv/bin/python scripts/classify_motions.py --delay 0.5
|
|
|
|
# Test with small sample first
|
|
.venv/bin/python scripts/classify_motions.py --limit 10 --delay 2
|
|
|
|
# Analyze existing classifications
|
|
.venv/bin/python scripts/classify_motions.py --analyze-only
|
|
```
|
|
|
|
## Why LLMs?
|
|
|
|
Rule-based keyword matching is too crude:
|
|
- Only captures 3-4% as "high extremity"
|
|
- Can't understand nuance ("verbod" appears in mundane contexts)
|
|
- Can't assess policy impact magnitude
|
|
|
|
LLMs can:
|
|
- Understand policy context and implications
|
|
- Assess deviation from consensus/norms
|
|
- Interpret Dutch political terminology
|
|
|
|
## Proposed LLM Classification Schema
|
|
|
|
### Output Format
|
|
```json
|
|
{
|
|
"extremity_score": 1-5,
|
|
"policy_domain": "migration|identity|economy|social|climate|foreign_policy|justice|education|health|other",
|
|
"policy_direction": "restrictive|permissive|neutral",
|
|
"deviation_type": "procedural|semantic|structural",
|
|
"consensus_level": "broad|partial|narrow|opposition",
|
|
"rationale": "1-2 sentence explanation"
|
|
}
|
|
```
|
|
|
|
### Extremity Scale (1-5)
|
|
|
|
| Score | Label | Description | Examples |
|
|
|-------|-------|-------------|----------|
|
|
| 1 | Mainstream | Standard governance, routine | Budget adjustments, procedural changes |
|
|
| 2 | Minor deviation | Small policy tweaks within consensus | Minor fee changes, small program adjustments |
|
|
| 3 | Moderate deviation | Meaningful but within coalition consensus | Immigration processing changes, targeted regulations |
|
|
| 4 | Major deviation | Challenges status quo meaningfully | Tighter migration rules, significant policy reversals |
|
|
| 5 | Extreme | Fundamental/populist, outside consensus | Complete bans, anti-democratic motions |
|
|
|
|
### Policy Direction
|
|
|
|
- **restrictive**: Limits freedoms, tightens rules, reduces access
|
|
- **permissive**: Expands freedoms, loosens rules, increases access
|
|
- **neutral**: Procedural, administrative, technical
|
|
|
|
### Consensus Level
|
|
|
|
- **broad**: Passed with 80%+ parties voting same way
|
|
- **partial**: Passed with 60-80% agreement
|
|
- **narrow**: Passed with 50-60% (close vote)
|
|
- **opposition**: Coalition parties voted against
|
|
|
|
## LLM Prompt
|
|
|
|
```
|
|
SYSTEM:
|
|
You are an expert on Dutch parliamentary politics. Classify parliamentary motions
|
|
on policy extremity using the provided schema.
|
|
|
|
CLASSIFICATION_RUBRIC:
|
|
- Score 1 (Mainstream): Routine governance, budget adjustments, procedural changes
|
|
- Score 2 (Minor): Small policy tweaks within consensus
|
|
- Score 3 (Moderate): Meaningful changes but within coalition consensus
|
|
- Score 4 (Major): Challenges status quo, significant policy shifts
|
|
- Score 5 (Extreme): Fundamental changes, populist, outside consensus
|
|
|
|
Consider:
|
|
- Policy impact magnitude
|
|
- Deviation from current norms/policies
|
|
- Coalition/opposition dynamics
|
|
- Dutch political context
|
|
|
|
USER:
|
|
Classify this motion:
|
|
|
|
Title: {title}
|
|
Description: {description}
|
|
Voting result: {passed/rejected}, {party_coalition} parties voted for
|
|
|
|
Respond in JSON format.
|
|
```
|
|
|
|
## Batch Processing Strategy
|
|
|
|
```python
|
|
import json
|
|
import asyncio
|
|
from openai import AsyncOpenAI
|
|
|
|
async def classify_motion_batch(motions: list[dict], model: str = "gpt-4o") -> list[dict]:
|
|
"""Process motions in parallel batches."""
|
|
|
|
client = AsyncOpenAI()
|
|
|
|
async def classify_one(motion: dict) -> dict:
|
|
prompt = build_prompt(motion)
|
|
|
|
response = await client.chat.completions.create(
|
|
model=model,
|
|
messages=[{"role": "system", "content": SYSTEM_PROMPT},
|
|
{"role": "user", "content": prompt}],
|
|
response_format={"type": "json_object"}
|
|
)
|
|
|
|
result = json.loads(response.choices[0].message.content)
|
|
result["motion_id"] = motion["id"]
|
|
return result
|
|
|
|
# Process 50 in parallel
|
|
results = []
|
|
for i in range(0, len(motions), 50):
|
|
batch = motions[i:i+50]
|
|
batch_results = await asyncio.gather(*[classify_one(m) for m in batch])
|
|
results.extend(batch_results)
|
|
|
|
return results
|
|
|
|
async def main():
|
|
motions = load_motions() # Load from database
|
|
classifications = await classify_motion_batch(motions)
|
|
save_to_database(classifications)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
## Cost Estimate
|
|
|
|
| Dataset Size | Model | Est. Cost | Est. Time |
|
|
|-------------|-------|-----------|-----------|
|
|
| 35,000 motions | gpt-4o-mini | ~$5-10 | 30-60 min |
|
|
| 35,000 motions | gpt-4o | ~$50-100 | 2-4 hours |
|
|
|
|
Using `gpt-4o-mini` is sufficient for classification tasks.
|
|
|
|
## Analysis After Classification
|
|
|
|
Once classified, we can analyze:
|
|
|
|
```python
|
|
# Extremity by period
|
|
df.groupby(['period', 'extremity_score']).size().unstack(fill_value=0)
|
|
|
|
# Domain-Extremity heatmap
|
|
pivot = df.pivot_table(values='motion_id',
|
|
index='policy_domain',
|
|
columns='extremity_score',
|
|
aggfunc='count')
|
|
|
|
# Passed vs rejected extremity
|
|
df.groupby('passed')['extremity_score'].mean()
|
|
|
|
# Coalition shift analysis
|
|
df[df['policy_domain'] == 'migration'].groupby(['period', 'policy_direction']).size()
|
|
```
|
|
|
|
## Expected Insights
|
|
|
|
1. **Extremity distribution over time** - Has 4-5 score increased?
|
|
2. **Domain-extremity correlation** - Which domains produce extreme policies?
|
|
3. **Direction-extremity** - Restrictive vs permissive extremity by period
|
|
4. **Consensus-extremity** - Are extreme policies passing with broad or narrow consensus?
|
|
5. **Coalition voting** - Which parties support extreme policies?
|
|
|