You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
346 lines
18 KiB
346 lines
18 KiB
---
|
|
title: Subagent-based two-dimensional extremity rescoring and mechanism analysis
|
|
type: feat
|
|
status: active
|
|
date: 2026-05-08
|
|
origin: docs/plans/2026-05-08-002-feat-overton-window-shift-plan.md
|
|
---
|
|
|
|
# Subagent-Based Two-Dimensional Extremity Rescoring and Mechanism Analysis
|
|
|
|
## Summary
|
|
|
|
The current Overton analysis has a known weakness: the LLM extremity score conflates stylistic radicalism (inflammatory language) with material policy impact (rights restricted, groups affected). The manual audit of 20 motions suggested 75% agreement — enough to trust the broad findings but not the fine-grained extremity-stratified analysis. This plan replaces the OpenRouter-based scoring pipeline with project-local subagents (deepseek v4 flash) that score motions via native reasoning. The subagent skill is a durable project asset usable for future LLM analyses. Additionally, we analyze which specific types of right-wing motions gained centrist support post-2024 (mechanism analysis), and compare content extremity shifts over time with the new dual-dimension scores.
|
|
|
|
---
|
|
|
|
## Problem Frame
|
|
|
|
The current `extremity_scorer.py` calls OpenRouter (mistral-small) with a single-dimension prompt asking "how radical is this?" on a 1-5 scale. This conflates two dimensions:
|
|
|
|
- **Stylistic extremity**: How inflammatory/harsh is the language?
|
|
- **Material impact**: How much would this policy actually restrict rights, affect groups, or reshape institutions?
|
|
|
|
The current scores cannot separate "rude but harmless" from "measured but devastating." The findings report flags this as the primary measurement concern (LLM audit at 75% agreement, systematic overrating of anti-institutional language).
|
|
|
|
Additionally, the Overton analysis tells us *that* centrist support rose post-2024 but not *which kinds* of right-wing motions drove this shift. Mechanism analysis fills this gap.
|
|
|
|
---
|
|
|
|
## Requirements
|
|
|
|
- R1. Write a project-local skill at `.opencode/skills/score-extremity/` that defines a two-dimensional scoring prompt, JSON output schema, and subagent-spawning workflow. The skill is a durable asset for future LLM analyses.
|
|
- R2. Score a stratified sample of 100 right-wing motions (25 per extremity bucket, 1-2 / 2-3 / 3-4 / 4-5) for both stylistic extremity and material impact. Compute correlation between the two dimensions.
|
|
- R3. If r > 0.7, confirm the single-dimensional scores are directionally usable. If r < 0.7, flag that separate dimensions matter and extend the sample.
|
|
- R4. Mechanism analysis: classify the 2,986 right-wing motions by policy mechanism (what specific institutional change the motion proposes) and compute which mechanisms gained the most centrist support post-2024.
|
|
- R5. Build the scoring infrastructure test-first. Each scoring subagent and the orchestration layer have unit tests mocking the subagent dispatch.
|
|
- R6. Update the findings report with dual-dimension correlation, mechanism analysis results, and refreshed content extremity narrative.
|
|
|
|
---
|
|
|
|
## Scope Boundaries
|
|
|
|
- In scope: Writing the skill, stratified 100-motion sample, mechanism classification, test infrastructure, report update.
|
|
- Out of scope: Re-scoring all 2,986 motions (deferred until r is measured). Interactive dashboard. Streamlit UI changes.
|
|
- The skill lives at `.opencode/skills/score-extremity/SKILL.md` — one file, no Python dependencies.
|
|
|
|
---
|
|
|
|
## Context & Research
|
|
|
|
### Relevant Code and Patterns
|
|
|
|
- `analysis/right_wing/extremity_scorer.py` — current single-dimension scoring (prompt template, JSON schema, batch orchestration)
|
|
- `analysis/right_wing/direction3_migration_antidemocratic.py` — analysis script pattern (DuckDB queries, matplotlib charts, markdown output)
|
|
- `reports/overton_window/findings_report.md` — current report with Section 8 next steps
|
|
- `tests/right_wing/` — empty directory, target for new test files
|
|
|
|
### Institutional Learnings
|
|
|
|
- `docs/solutions/best-practices/overton-window-shift-methodology-2026-05-24.md` — Step 7 describes 2D rescoring and manual audit
|
|
- `docs/solutions/insights/llm-motion-classification-prompt-design.md` — prior work on orthogonal prompt dimensions
|
|
|
|
### Key Technical Decisions
|
|
|
|
- **Subagents, not OpenRouter API calls.** deepseek v4 flash subagents score motions natively via reasoning. No API keys, no rate limits, no cost. The orchestrating script spawns subagents via the `task` tool and collects structured JSON.
|
|
- **Skill as prompt artifact, not code.** The `.opencode/skills/score-extremity/SKILL.md` defines the scoring prompt, JSON schema, and subagent-spawning instructions in natural language. The orchestrating Python script reads the skill, formats prompts, and spawns subagents.
|
|
- **Batch size: 10 motions per subagent.** Each subagent scores 10 motions for both dimensions. 100 motions = 10 subagents. Parallel dispatch via one `task` call per batch.
|
|
- **Stratified sample across all 4 extremity buckets.** 25 per bucket from the existing LLM scores. This tests whether the two dimensions diverge more in high-extremity buckets (where inflammatory language may dominate).
|
|
- **Mechanism taxonomy derived from the data.** The subagent derives mechanism categories from the motion text (e.g., "detention/removal", "benefit restriction", "institutional bypass", "symbolic/declarative", "rights limitation", "procedural hurdle"). No pre-defined taxonomy.
|
|
- **Storage: two DB tables.** `extremity_scores_2d` (motion_id, stylistic_score, material_score, subagent_explanation) and `motion_mechanisms` (motion_id, mechanism_category, centrist_support_delta). Existing tables unchanged.
|
|
|
|
### Open Questions
|
|
|
|
#### Resolved During Planning
|
|
|
|
- **Q: How to test subagent spawning?** Mock the subagent dispatch layer. The skill produces a JSON contract; test that the orchestrator correctly parses and stores results.
|
|
- **Q: Which 100 motions to sample?** Stratified random from the 2,986 classified right-wing motions, 25 per extremity bucket, seeded for reproducibility.
|
|
|
|
#### Deferred to Implementation
|
|
|
|
- **Q: Exact mechanism taxonomy** — derived by the subagent from the data, not pre-specified.
|
|
- **Q: Whether to extend the sample beyond 100** — depends on the r value between dimensions.
|
|
|
|
---
|
|
|
|
## Output Structure
|
|
|
|
```
|
|
.opencode/skills/score-extremity/
|
|
SKILL.md # Scoring prompt, JSON schema, subagent workflow
|
|
|
|
analysis/right_wing/
|
|
extremity_rescore_2d.py # Orchestrator: reads skill, spawns 10 subagents, collects results
|
|
mechanism_analysis.py # Mechanism classification + centrist support breakdown
|
|
|
|
tests/right_wing/
|
|
test_extremity_rescore_2d.py # Unit tests for orchestrator
|
|
test_mechanism_analysis.py # Unit tests for mechanism pipeline
|
|
```
|
|
|
|
---
|
|
|
|
## High-Level Technical Design
|
|
|
|
> *This illustrates the intended approach and is directional guidance for review, not implementation specification.*
|
|
|
|
```
|
|
┌──────────────────────┐
|
|
│ .opencode/skills/ │
|
|
│ score-extremity/ │ ← Read by orchestrator
|
|
│ SKILL.md │
|
|
│ - Prompt template │
|
|
│ - JSON schema │
|
|
│ - Subagent workflow │
|
|
└──────────┬───────────┘
|
|
│ read
|
|
┌──────────▼───────────┐
|
|
│ extremity_rescore_2d │
|
|
│ .py (orchestrator) │
|
|
│ │
|
|
│ 1. Query 100 motions │
|
|
│ 2. Format 10 batches │
|
|
│ 3. Spawn 10 subagents │──→ subagent scores 10 motions
|
|
│ 4. Collect JSON │ returns {motion_id: {stylistic_score, material_score}}
|
|
│ 5. Validate + store │
|
|
└──────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Units
|
|
|
|
### U1. Write the Scoring Skill
|
|
|
|
**Goal:** Create a project-local skill that an orchestrator can read to configure subagent-based two-dimensional extremity scoring.
|
|
|
|
**Requirements:** R1
|
|
|
|
**Dependencies:** None
|
|
|
|
**Files:**
|
|
- Create: `.opencode/skills/score-extremity/SKILL.md`
|
|
|
|
**Approach:**
|
|
- YAML frontmatter: `name: score-extremity`, `description: "Two-dimensional extremity scoring for Dutch parliamentary motions. Use when scoring policy radicalism along stylistic vs material impact dimensions."`
|
|
- Body: the two-dimensional scoring prompt (in Dutch, matching the existing PROMPT_TEMPLATE style). Define two scores: `stijl_extremiteit` (1–5, inflammatory language) and `materiele_impact` (1–5, substantive rights/policy effect).
|
|
- Body: the JSON output schema matching the prompt.
|
|
- Body: instructions for how the orchestrator should spawn subagents (batch size 10, parallel dispatch, collect results, validate JSON).
|
|
|
|
**Patterns to follow:**
|
|
- Existing PROMPT_TEMPLATE in `analysis/right_wing/extremity_scorer.py` for prompt structure
|
|
- `~/.config/opencode/skills/ce-work/SKILL.md` for YAML frontmatter conventions
|
|
|
|
**Test scenarios:**
|
|
- Edge case: skill file has valid YAML frontmatter with required `name` and `description` fields.
|
|
- Edge case: skill body contains the expected sections (prompt template, JSON schema, usage instructions).
|
|
- Happy path: orchestrator can read the skill file and extract prompt + schema.
|
|
|
|
**Verification:**
|
|
- `opencode` detects the skill at startup (listed in available_skills).
|
|
- The skill contains a clear two-dimensional scoring prompt in Dutch.
|
|
|
|
---
|
|
|
|
### U2. Build Orchestrator + Subagent Scoring Pipeline (TDD)
|
|
|
|
**Goal:** Build the orchestrating script that reads the skill, queries 100 motions, spawns subagents, collects and validates results, and stores two-dimensional scores in the database.
|
|
|
|
**Requirements:** R2, R5
|
|
|
|
**Dependencies:** U1
|
|
|
|
**Files:**
|
|
- Create: `analysis/right_wing/extremity_rescore_2d.py`
|
|
- Create: `tests/right_wing/test_extremity_rescore_2d.py`
|
|
|
|
**Approach:**
|
|
1. **Test-first:** Write tests for the orchestrator before implementation:
|
|
- Test that `load_skill()` returns prompt and schema from SKILL.md
|
|
- Test that `format_batches(motions, batch_size=10)` splits correctly
|
|
- Test that `validate_subagent_result(result, schema)` catches malformed JSON
|
|
- Test that `store_scores(db_path, results)` writes to `extremity_scores_2d` table
|
|
- Mock the subagent dispatch to return synthetic JSON
|
|
2. **Implementation:**
|
|
- `load_skill()` — reads `.opencode/skills/score-extremity/SKILL.md`, parses YAML frontmatter, returns body
|
|
- `sample_motions(db_path, n_per_bucket=25, seed=42)` — stratified query from `right_wing_motions` JOIN `extremity_scores`
|
|
- `format_batches()` — groups motions into batches of 10, builds prompts with motion text + layman explanation
|
|
- `spawn_and_collect()` — orchestrator reads the skill, manually formats context for each subagent batch, spawns via `task` tool with return JSON contract
|
|
- `validate_and_store()` — validates each result against the schema, writes to DB
|
|
3. **Database:** `CREATE TABLE IF NOT EXISTS extremity_scores_2d (motion_id INTEGER PRIMARY KEY, stylistic_score INTEGER, material_score INTEGER, stylistic_rationale TEXT, material_rationale TEXT)`
|
|
|
|
**Execution note:** Implement test-first. Write failing tests, then implementation.
|
|
|
|
**Patterns to follow:**
|
|
- `analysis/right_wing/extremity_scorer.py` — existing DB write patterns
|
|
- `tests/agent_tools/test_database_tools.py` — temp DB fixture patterns
|
|
|
|
**Test scenarios:**
|
|
- Happy path: load_skill returns non-empty prompt and schema.
|
|
- Happy path: format_batches with 100 motions produces 10 batches of 10.
|
|
- Happy path: validate_and_store with valid JSON inserts 10 rows into extremity_scores_2d.
|
|
- Edge case: missing SKILL.md raises clear error.
|
|
- Edge case: fewer than 100 motions in a bucket samples what's available.
|
|
- Edge case: subagent returns missing field in JSON — validator rejects.
|
|
- Edge case: subagent returns score outside 1–5 range — validator rejects.
|
|
|
|
**Verification:**
|
|
- All tests pass before any subagent is spawned.
|
|
- `extremity_scores_2d` table exists with correct schema.
|
|
- Orchestrator can be configured with a `--dry-run` flag that validates the pipeline without spawning subagents.
|
|
|
|
---
|
|
|
|
### U3. Execute the 100-Motion Rescoring
|
|
|
|
**Goal:** Run the orchestrator to score 100 motions, compute the correlation between stylistic and material extremity, and report the results.
|
|
|
|
**Requirements:** R2, R3
|
|
|
|
**Dependencies:** U2
|
|
|
|
**Files:**
|
|
- Modify: `analysis/right_wing/extremity_rescore_2d.py` (any fixes from live run)
|
|
- Output: `reports/overton_window/extremity_2d_correlation.md`
|
|
|
|
**Approach:**
|
|
1. Run the orchestrator with actual subagent dispatch (no `--dry-run`)
|
|
2. Spawn 10 subagents in parallel, each scoring 10 motions
|
|
3. Collect all results, validate against schema
|
|
4. Compute Pearson r between stylistic_score and material_score
|
|
5. Write a short correlation report with:
|
|
- Overall r and per-bucket r
|
|
- Scatter plot of stylistic vs material scores
|
|
- Conclusion: "dimensions are separable" if r < 0.7, "single score sufficient" if r > 0.7
|
|
- Recommendation for next steps (extend sample, re-score all, or proceed)
|
|
|
|
**Technical design:**
|
|
The orchestrator calls the `task` tool with the skill's prompt and each batch's motion data. Each subagent returns:
|
|
```json
|
|
{
|
|
"motions": [
|
|
{"motion_id": 123, "stijl_extremiteit": 3, "materiele_impact": 4, "rationale": "..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Verification:**
|
|
- 100 motions have both stylistic_score and material_score.
|
|
- Correlation report written with clear r value.
|
|
- All scores are integers 1–5.
|
|
|
|
---
|
|
|
|
### U4. Mechanism Analysis
|
|
|
|
**Goal:** Classify right-wing motions by policy mechanism and compute which mechanisms gained the most centrist support post-2024.
|
|
|
|
**Requirements:** R4, R5
|
|
|
|
**Dependencies:** None (reads existing DB tables)
|
|
|
|
**Files:**
|
|
- Create: `analysis/right_wing/mechanism_analysis.py`
|
|
- Create: `tests/right_wing/test_mechanism_analysis.py`
|
|
- Output: `reports/overton_window/mechanism_analysis.md`
|
|
|
|
**Approach:**
|
|
1. **Subagent-based classification:** Spawn subagents to classify motions by mechanism. Each subagent receives 25 motions and returns JSON mapping `motion_id -> mechanism_category`. The subagent derives categories from the data (not a pre-defined taxonomy).
|
|
2. **Test-first:** Write tests for the orchestration layer (query, batch formatting, table creation, result validation).
|
|
3. **Compute centrist support per mechanism:** Using `centrist_support_strict` from `right_wing_motions`, compute pre/post-2024 centrist support and delta per mechanism category.
|
|
4. **Report:** Table of mechanism categories ranked by centrist support delta, with N per category. Top-5 mechanisms visualization.
|
|
|
|
**Execution note:** Implement test-first. Mock subagent dispatch in tests.
|
|
|
|
**Patterns to follow:**
|
|
- `analysis/right_wing/direction3_migration_antidemocratic.py` — category breakdown patterns
|
|
- `analysis/right_wing/overton_breakpoint_analysis.py` — pre/post comparison patterns
|
|
|
|
**Test scenarios:**
|
|
- Happy path: mechanism analysis script runs on real DB and produces a markdown report.
|
|
- Happy path: table has mechanism categories with N, pre-CS, post-CS, delta columns.
|
|
- Edge case: subagent returns unknown mechanism category — orchestrator normalizes or flags.
|
|
- Edge case: mechanism category with <5 motions flagged as unreliable.
|
|
|
|
**Verification:**
|
|
- `reports/overton_window/mechanism_analysis.md` exists with mechanism breakdown.
|
|
- Report includes centrist support delta per mechanism.
|
|
- Top mechanism is identified with supporting evidence from motion titles.
|
|
|
|
---
|
|
|
|
### U5. Update Findings Report
|
|
|
|
**Goal:** Integrate dual-dimension correlation and mechanism analysis into the Overton findings report.
|
|
|
|
**Requirements:** R6
|
|
|
|
**Dependencies:** U3, U4
|
|
|
|
**Files:**
|
|
- Modify: `reports/overton_window/findings_report.md`
|
|
|
|
**Approach:**
|
|
1. Add a new Section 3b (or update Section 3 Content Extremity) with:
|
|
- Two-dimensional scoring results and correlation
|
|
- Whether the single-dimensional scores are confirmed or need revision
|
|
- Updated content extremity narrative with caveats refined by dual-dimension insight
|
|
2. Add a new Section 7 (Mechanism Analysis) with:
|
|
- Which mechanisms drove the centrist support surge
|
|
- Migration vs non-migration mechanism differences
|
|
3. Update Section 8 (Next Steps) to reflect completed 2D rescoring and mechanism work
|
|
|
|
**Verification:**
|
|
- Report is internally consistent.
|
|
- New sections reference the right figures and tables.
|
|
- Next steps don't list work that's already done.
|
|
|
|
---
|
|
|
|
## System-Wide Impact
|
|
|
|
- **New DB tables:** `extremity_scores_2d`, `motion_mechanisms` — additive, no existing data modified.
|
|
- **New skill:** `.opencode/skills/score-extremity/SKILL.md` — no code impact, only prompt artifact.
|
|
- **No UI changes, no agent_tools changes, no pipeline changes.**
|
|
- **Tests:** New tests in `tests/right_wing/` do not affect existing test suite.
|
|
|
|
---
|
|
|
|
## Risks & Dependencies
|
|
|
|
| Risk | Mitigation |
|
|
|---|---|
|
|
| Subagent capacity limits (too many parallel dispatches) | Batch size 10 = 10 parallel subagents. Well within limits for 100 motions. If extending to 2,986, use hybrid approach (larger batches or fallback to API). |
|
|
| Subagent returns malformed JSON | Validator layer rejects and retries individual batches (max 2 retries). |
|
|
| Two dimensions correlate highly (r > 0.9) | Confirms the single-dimensional scores are directionally valid. Write this finding up as a confirmatory result — still valuable. |
|
|
| Mechanism taxonomy is too coarse to discriminate | Subagent derives from data, not pre-defined taxonomy. Iterative refinement in the subagent prompt if first pass is noisy. |
|
|
|
|
---
|
|
|
|
## Sources & References
|
|
|
|
- Origin plan: `docs/plans/2026-05-08-002-feat-overton-window-shift-plan.md`
|
|
- Findings report: `reports/overton_window/findings_report.md`
|
|
- Methodology doc: `docs/solutions/best-practices/overton-window-shift-methodology-2026-05-24.md`
|
|
- Existing scorer: `analysis/right_wing/extremity_scorer.py`
|
|
- Skill format reference: `~/.config/opencode/skills/ce-work/SKILL.md`
|
|
|