You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/solutions/insights/llm-motion-classification-p...

125 lines
4.2 KiB

---
module: llm-classification
tags: [polarization, nlp, prompt-design, democratic-norms]
problem_type: classification-schema-design
date: 2026-04-05
reviewed_by:
- correctness-reviewer
- domain-expert (Dutch politics)
- clarity-reviewer
---
# LLM Motion Classification: Prompt Design Lessons
## Problem
Wanted to classify 28,000 Dutch parliamentary motions by "extremity" to measure polarization over time.
Initial prompt conflated multiple concepts:
- Democratic norm erosion
- Populist rhetoric style
- Group targeting
- Restrictiveness vs permissiveness
## Initial v1 Design (Flawed)
```python
EXTREMITY_SCORE (1-5):
- 1: Mainstream
- 5: "Undermines checks & balances, threatens rule of law,
discriminates groups, populist rhetoric"
```
**Problems identified:**
1. Populist rhetoric is style, not substance — shouldn't be in same score as democratic erosion
2. "Extreme" undefined — compared to what baseline?
3. Score 4/5 boundary unclear
4. TARGETED_GROUP redundant with EXTREMITY_SCORE
5. EU deviation always = score 5 (too broad)
6. Missing Dutch-specific patterns (Nexit, referendum abolition)
## Refined v2 Design (Four Orthogonal Dimensions)
### 1. DEMOCRATIC_EROSION (0-4) — Substance only
| Score | Label | Criteria |
|-------|-------|----------|
| 0 | None | No impact on democratic norms |
| 1 | Minor | Small procedural deviations |
| 2 | Moderate | Significant policy change, within constitutional framework |
| 3 | Significant | Fundamental change to checks & balances |
| 4 | Critical | Undermines rule of law, press freedom, systematic discrimination |
**Decision rules:**
- Score 4 ONLY if: (a) direct attack on judiciary/press, OR (b) systematic discrimination in law, OR (c) call to violate international treaties
- Score 3 if: (a) abolish referendum, OR (b) fundamentally question EU cooperation, OR (c) significantly expand executive powers
### 2. POPULIST_STYLE (0-1) — Style only
Independent of democratic impact. A motion can be populist (1) but democratic (0).
**Indicators:**
- "Het volk" vs "de elite/den Haag"
- "Wij vs zij" framing
- Call for "direct democracy" without checks
- Emotionally charged language
### 3. GROUP_TARGETING (0-2) — Targeting only
| Score | Label |
|-------|-------|
| 0 | Universal — general policy |
| 1 | Indirect — general policy that disproportionately affects groups |
| 2 | Direct — explicitly targets specific population group |
### 4. RESTRICTIVENESS (-1 to +1) — Direction only
| Score | Label |
|-------|-------|
| -1 | Expansive |
| 0 | Neutral |
| +1 | Restrictive |
## Key Lessons Learned
### 1. Separate Style from Substance
Populist rhetoric ≠ democratic erosion. A mainstream party using strong language isn't anti-democratic. Conflating them causes false positives.
### 2. Make Dimensions Orthogonal
- DEMOCRATIC_EROSION × RESTRICTIVENESS: A policy can be erosive AND restrictive, or erosive AND permissive
- POPULIST_STYLE × DEMOCRATIC_EROSION: Can have populist (1) with democratic (0), and vice versa
- GROUP_TARGETING × RESTRICTIVENESS: Restrictive ≠ targeted (and vice versa)
### 3. Add Decision Rules for Boundaries
Vague transitions ("significant" → "critical") cause inconsistency. Define specific triggers:
```
Score 4 ONLY when: (a) OR (b) OR (c)
Score 3 when: (a) OR (b) OR (c)
```
### 4. Gradate EU Deviation
Not all EU deviation is equal:
- Dutch implementation of EU policy → erosion 0-1
- Nexit / leave EU → erosion 3-4
- Violate EU rules → erosion 2-3
### 5. Include Domain-Specific Patterns
Dutch context matters:
- Referendum abolition = score 3
- "Den Haag" / "establishment" attacks = check for populist style
- Nexit = score 3-4 depending on framing
### 6. Define Reference Baselines
"Abnormal" compared to what?
- 2016 consensus
- EU norms
- Historical Dutch practice
- International standards
## Testing Recommendations
1. **Calibration set**: 50 motions with expert annotations before production
2. **Boundary cases**: Test score 3/4 transitions explicitly
3. **Cross-rater reliability**: Multiple classifiers on same motions
4. **Domain-specific test cases**: Migration, EU, constitutional reform
## Files
- `scripts/classify_motions.py` — Implementation with v2 prompt
- `docs/research/motion-classification-prompt-v2.md` — Full prompt documentation