parent
bf37f84a8b
commit
b6612d834a
@ -0,0 +1,346 @@ |
||||
--- |
||||
title: Subagent-based two-dimensional extremity rescoring and mechanism analysis |
||||
type: feat |
||||
status: active |
||||
date: 2026-05-08 |
||||
origin: docs/plans/2026-05-08-002-feat-overton-window-shift-plan.md |
||||
--- |
||||
|
||||
# Subagent-Based Two-Dimensional Extremity Rescoring and Mechanism Analysis |
||||
|
||||
## Summary |
||||
|
||||
The current Overton analysis has a known weakness: the LLM extremity score conflates stylistic radicalism (inflammatory language) with material policy impact (rights restricted, groups affected). The manual audit of 20 motions suggested 75% agreement — enough to trust the broad findings but not the fine-grained extremity-stratified analysis. This plan replaces the OpenRouter-based scoring pipeline with project-local subagents (deepseek v4 flash) that score motions via native reasoning. The subagent skill is a durable project asset usable for future LLM analyses. Additionally, we analyze which specific types of right-wing motions gained centrist support post-2024 (mechanism analysis), and compare content extremity shifts over time with the new dual-dimension scores. |
||||
|
||||
--- |
||||
|
||||
## Problem Frame |
||||
|
||||
The current `extremity_scorer.py` calls OpenRouter (mistral-small) with a single-dimension prompt asking "how radical is this?" on a 1-5 scale. This conflates two dimensions: |
||||
|
||||
- **Stylistic extremity**: How inflammatory/harsh is the language? |
||||
- **Material impact**: How much would this policy actually restrict rights, affect groups, or reshape institutions? |
||||
|
||||
The current scores cannot separate "rude but harmless" from "measured but devastating." The findings report flags this as the primary measurement concern (LLM audit at 75% agreement, systematic overrating of anti-institutional language). |
||||
|
||||
Additionally, the Overton analysis tells us *that* centrist support rose post-2024 but not *which kinds* of right-wing motions drove this shift. Mechanism analysis fills this gap. |
||||
|
||||
--- |
||||
|
||||
## Requirements |
||||
|
||||
- R1. Write a project-local skill at `.opencode/skills/score-extremity/` that defines a two-dimensional scoring prompt, JSON output schema, and subagent-spawning workflow. The skill is a durable asset for future LLM analyses. |
||||
- R2. Score a stratified sample of 100 right-wing motions (25 per extremity bucket, 1-2 / 2-3 / 3-4 / 4-5) for both stylistic extremity and material impact. Compute correlation between the two dimensions. |
||||
- R3. If r > 0.7, confirm the single-dimensional scores are directionally usable. If r < 0.7, flag that separate dimensions matter and extend the sample. |
||||
- R4. Mechanism analysis: classify the 2,986 right-wing motions by policy mechanism (what specific institutional change the motion proposes) and compute which mechanisms gained the most centrist support post-2024. |
||||
- R5. Build the scoring infrastructure test-first. Each scoring subagent and the orchestration layer have unit tests mocking the subagent dispatch. |
||||
- R6. Update the findings report with dual-dimension correlation, mechanism analysis results, and refreshed content extremity narrative. |
||||
|
||||
--- |
||||
|
||||
## Scope Boundaries |
||||
|
||||
- In scope: Writing the skill, stratified 100-motion sample, mechanism classification, test infrastructure, report update. |
||||
- Out of scope: Re-scoring all 2,986 motions (deferred until r is measured). Interactive dashboard. Streamlit UI changes. |
||||
- The skill lives at `.opencode/skills/score-extremity/SKILL.md` — one file, no Python dependencies. |
||||
|
||||
--- |
||||
|
||||
## Context & Research |
||||
|
||||
### Relevant Code and Patterns |
||||
|
||||
- `analysis/right_wing/extremity_scorer.py` — current single-dimension scoring (prompt template, JSON schema, batch orchestration) |
||||
- `analysis/right_wing/direction3_migration_antidemocratic.py` — analysis script pattern (DuckDB queries, matplotlib charts, markdown output) |
||||
- `reports/overton_window/findings_report.md` — current report with Section 8 next steps |
||||
- `tests/right_wing/` — empty directory, target for new test files |
||||
|
||||
### Institutional Learnings |
||||
|
||||
- `docs/solutions/best-practices/overton-window-shift-methodology-2026-05-24.md` — Step 7 describes 2D rescoring and manual audit |
||||
- `docs/solutions/insights/llm-motion-classification-prompt-design.md` — prior work on orthogonal prompt dimensions |
||||
|
||||
### Key Technical Decisions |
||||
|
||||
- **Subagents, not OpenRouter API calls.** deepseek v4 flash subagents score motions natively via reasoning. No API keys, no rate limits, no cost. The orchestrating script spawns subagents via the `task` tool and collects structured JSON. |
||||
- **Skill as prompt artifact, not code.** The `.opencode/skills/score-extremity/SKILL.md` defines the scoring prompt, JSON schema, and subagent-spawning instructions in natural language. The orchestrating Python script reads the skill, formats prompts, and spawns subagents. |
||||
- **Batch size: 10 motions per subagent.** Each subagent scores 10 motions for both dimensions. 100 motions = 10 subagents. Parallel dispatch via one `task` call per batch. |
||||
- **Stratified sample across all 4 extremity buckets.** 25 per bucket from the existing LLM scores. This tests whether the two dimensions diverge more in high-extremity buckets (where inflammatory language may dominate). |
||||
- **Mechanism taxonomy derived from the data.** The subagent derives mechanism categories from the motion text (e.g., "detention/removal", "benefit restriction", "institutional bypass", "symbolic/declarative", "rights limitation", "procedural hurdle"). No pre-defined taxonomy. |
||||
- **Storage: two DB tables.** `extremity_scores_2d` (motion_id, stylistic_score, material_score, subagent_explanation) and `motion_mechanisms` (motion_id, mechanism_category, centrist_support_delta). Existing tables unchanged. |
||||
|
||||
### Open Questions |
||||
|
||||
#### Resolved During Planning |
||||
|
||||
- **Q: How to test subagent spawning?** Mock the subagent dispatch layer. The skill produces a JSON contract; test that the orchestrator correctly parses and stores results. |
||||
- **Q: Which 100 motions to sample?** Stratified random from the 2,986 classified right-wing motions, 25 per extremity bucket, seeded for reproducibility. |
||||
|
||||
#### Deferred to Implementation |
||||
|
||||
- **Q: Exact mechanism taxonomy** — derived by the subagent from the data, not pre-specified. |
||||
- **Q: Whether to extend the sample beyond 100** — depends on the r value between dimensions. |
||||
|
||||
--- |
||||
|
||||
## Output Structure |
||||
|
||||
``` |
||||
.opencode/skills/score-extremity/ |
||||
SKILL.md # Scoring prompt, JSON schema, subagent workflow |
||||
|
||||
analysis/right_wing/ |
||||
extremity_rescore_2d.py # Orchestrator: reads skill, spawns 10 subagents, collects results |
||||
mechanism_analysis.py # Mechanism classification + centrist support breakdown |
||||
|
||||
tests/right_wing/ |
||||
test_extremity_rescore_2d.py # Unit tests for orchestrator |
||||
test_mechanism_analysis.py # Unit tests for mechanism pipeline |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## High-Level Technical Design |
||||
|
||||
> *This illustrates the intended approach and is directional guidance for review, not implementation specification.* |
||||
|
||||
``` |
||||
┌──────────────────────┐ |
||||
│ .opencode/skills/ │ |
||||
│ score-extremity/ │ ← Read by orchestrator |
||||
│ SKILL.md │ |
||||
│ - Prompt template │ |
||||
│ - JSON schema │ |
||||
│ - Subagent workflow │ |
||||
└──────────┬───────────┘ |
||||
│ read |
||||
┌──────────▼───────────┐ |
||||
│ extremity_rescore_2d │ |
||||
│ .py (orchestrator) │ |
||||
│ │ |
||||
│ 1. Query 100 motions │ |
||||
│ 2. Format 10 batches │ |
||||
│ 3. Spawn 10 subagents │──→ subagent scores 10 motions |
||||
│ 4. Collect JSON │ returns {motion_id: {stylistic_score, material_score}} |
||||
│ 5. Validate + store │ |
||||
└──────────────────────┘ |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## Implementation Units |
||||
|
||||
### U1. Write the Scoring Skill |
||||
|
||||
**Goal:** Create a project-local skill that an orchestrator can read to configure subagent-based two-dimensional extremity scoring. |
||||
|
||||
**Requirements:** R1 |
||||
|
||||
**Dependencies:** None |
||||
|
||||
**Files:** |
||||
- Create: `.opencode/skills/score-extremity/SKILL.md` |
||||
|
||||
**Approach:** |
||||
- YAML frontmatter: `name: score-extremity`, `description: "Two-dimensional extremity scoring for Dutch parliamentary motions. Use when scoring policy radicalism along stylistic vs material impact dimensions."` |
||||
- Body: the two-dimensional scoring prompt (in Dutch, matching the existing PROMPT_TEMPLATE style). Define two scores: `stijl_extremiteit` (1–5, inflammatory language) and `materiele_impact` (1–5, substantive rights/policy effect). |
||||
- Body: the JSON output schema matching the prompt. |
||||
- Body: instructions for how the orchestrator should spawn subagents (batch size 10, parallel dispatch, collect results, validate JSON). |
||||
|
||||
**Patterns to follow:** |
||||
- Existing PROMPT_TEMPLATE in `analysis/right_wing/extremity_scorer.py` for prompt structure |
||||
- `~/.config/opencode/skills/ce-work/SKILL.md` for YAML frontmatter conventions |
||||
|
||||
**Test scenarios:** |
||||
- Edge case: skill file has valid YAML frontmatter with required `name` and `description` fields. |
||||
- Edge case: skill body contains the expected sections (prompt template, JSON schema, usage instructions). |
||||
- Happy path: orchestrator can read the skill file and extract prompt + schema. |
||||
|
||||
**Verification:** |
||||
- `opencode` detects the skill at startup (listed in available_skills). |
||||
- The skill contains a clear two-dimensional scoring prompt in Dutch. |
||||
|
||||
--- |
||||
|
||||
### U2. Build Orchestrator + Subagent Scoring Pipeline (TDD) |
||||
|
||||
**Goal:** Build the orchestrating script that reads the skill, queries 100 motions, spawns subagents, collects and validates results, and stores two-dimensional scores in the database. |
||||
|
||||
**Requirements:** R2, R5 |
||||
|
||||
**Dependencies:** U1 |
||||
|
||||
**Files:** |
||||
- Create: `analysis/right_wing/extremity_rescore_2d.py` |
||||
- Create: `tests/right_wing/test_extremity_rescore_2d.py` |
||||
|
||||
**Approach:** |
||||
1. **Test-first:** Write tests for the orchestrator before implementation: |
||||
- Test that `load_skill()` returns prompt and schema from SKILL.md |
||||
- Test that `format_batches(motions, batch_size=10)` splits correctly |
||||
- Test that `validate_subagent_result(result, schema)` catches malformed JSON |
||||
- Test that `store_scores(db_path, results)` writes to `extremity_scores_2d` table |
||||
- Mock the subagent dispatch to return synthetic JSON |
||||
2. **Implementation:** |
||||
- `load_skill()` — reads `.opencode/skills/score-extremity/SKILL.md`, parses YAML frontmatter, returns body |
||||
- `sample_motions(db_path, n_per_bucket=25, seed=42)` — stratified query from `right_wing_motions` JOIN `extremity_scores` |
||||
- `format_batches()` — groups motions into batches of 10, builds prompts with motion text + layman explanation |
||||
- `spawn_and_collect()` — orchestrator reads the skill, manually formats context for each subagent batch, spawns via `task` tool with return JSON contract |
||||
- `validate_and_store()` — validates each result against the schema, writes to DB |
||||
3. **Database:** `CREATE TABLE IF NOT EXISTS extremity_scores_2d (motion_id INTEGER PRIMARY KEY, stylistic_score INTEGER, material_score INTEGER, stylistic_rationale TEXT, material_rationale TEXT)` |
||||
|
||||
**Execution note:** Implement test-first. Write failing tests, then implementation. |
||||
|
||||
**Patterns to follow:** |
||||
- `analysis/right_wing/extremity_scorer.py` — existing DB write patterns |
||||
- `tests/agent_tools/test_database_tools.py` — temp DB fixture patterns |
||||
|
||||
**Test scenarios:** |
||||
- Happy path: load_skill returns non-empty prompt and schema. |
||||
- Happy path: format_batches with 100 motions produces 10 batches of 10. |
||||
- Happy path: validate_and_store with valid JSON inserts 10 rows into extremity_scores_2d. |
||||
- Edge case: missing SKILL.md raises clear error. |
||||
- Edge case: fewer than 100 motions in a bucket samples what's available. |
||||
- Edge case: subagent returns missing field in JSON — validator rejects. |
||||
- Edge case: subagent returns score outside 1–5 range — validator rejects. |
||||
|
||||
**Verification:** |
||||
- All tests pass before any subagent is spawned. |
||||
- `extremity_scores_2d` table exists with correct schema. |
||||
- Orchestrator can be configured with a `--dry-run` flag that validates the pipeline without spawning subagents. |
||||
|
||||
--- |
||||
|
||||
### U3. Execute the 100-Motion Rescoring |
||||
|
||||
**Goal:** Run the orchestrator to score 100 motions, compute the correlation between stylistic and material extremity, and report the results. |
||||
|
||||
**Requirements:** R2, R3 |
||||
|
||||
**Dependencies:** U2 |
||||
|
||||
**Files:** |
||||
- Modify: `analysis/right_wing/extremity_rescore_2d.py` (any fixes from live run) |
||||
- Output: `reports/overton_window/extremity_2d_correlation.md` |
||||
|
||||
**Approach:** |
||||
1. Run the orchestrator with actual subagent dispatch (no `--dry-run`) |
||||
2. Spawn 10 subagents in parallel, each scoring 10 motions |
||||
3. Collect all results, validate against schema |
||||
4. Compute Pearson r between stylistic_score and material_score |
||||
5. Write a short correlation report with: |
||||
- Overall r and per-bucket r |
||||
- Scatter plot of stylistic vs material scores |
||||
- Conclusion: "dimensions are separable" if r < 0.7, "single score sufficient" if r > 0.7 |
||||
- Recommendation for next steps (extend sample, re-score all, or proceed) |
||||
|
||||
**Technical design:** |
||||
The orchestrator calls the `task` tool with the skill's prompt and each batch's motion data. Each subagent returns: |
||||
```json |
||||
{ |
||||
"motions": [ |
||||
{"motion_id": 123, "stijl_extremiteit": 3, "materiele_impact": 4, "rationale": "..."} |
||||
] |
||||
} |
||||
``` |
||||
|
||||
**Verification:** |
||||
- 100 motions have both stylistic_score and material_score. |
||||
- Correlation report written with clear r value. |
||||
- All scores are integers 1–5. |
||||
|
||||
--- |
||||
|
||||
### U4. Mechanism Analysis |
||||
|
||||
**Goal:** Classify right-wing motions by policy mechanism and compute which mechanisms gained the most centrist support post-2024. |
||||
|
||||
**Requirements:** R4, R5 |
||||
|
||||
**Dependencies:** None (reads existing DB tables) |
||||
|
||||
**Files:** |
||||
- Create: `analysis/right_wing/mechanism_analysis.py` |
||||
- Create: `tests/right_wing/test_mechanism_analysis.py` |
||||
- Output: `reports/overton_window/mechanism_analysis.md` |
||||
|
||||
**Approach:** |
||||
1. **Subagent-based classification:** Spawn subagents to classify motions by mechanism. Each subagent receives 25 motions and returns JSON mapping `motion_id -> mechanism_category`. The subagent derives categories from the data (not a pre-defined taxonomy). |
||||
2. **Test-first:** Write tests for the orchestration layer (query, batch formatting, table creation, result validation). |
||||
3. **Compute centrist support per mechanism:** Using `centrist_support_strict` from `right_wing_motions`, compute pre/post-2024 centrist support and delta per mechanism category. |
||||
4. **Report:** Table of mechanism categories ranked by centrist support delta, with N per category. Top-5 mechanisms visualization. |
||||
|
||||
**Execution note:** Implement test-first. Mock subagent dispatch in tests. |
||||
|
||||
**Patterns to follow:** |
||||
- `analysis/right_wing/direction3_migration_antidemocratic.py` — category breakdown patterns |
||||
- `analysis/right_wing/overton_breakpoint_analysis.py` — pre/post comparison patterns |
||||
|
||||
**Test scenarios:** |
||||
- Happy path: mechanism analysis script runs on real DB and produces a markdown report. |
||||
- Happy path: table has mechanism categories with N, pre-CS, post-CS, delta columns. |
||||
- Edge case: subagent returns unknown mechanism category — orchestrator normalizes or flags. |
||||
- Edge case: mechanism category with <5 motions flagged as unreliable. |
||||
|
||||
**Verification:** |
||||
- `reports/overton_window/mechanism_analysis.md` exists with mechanism breakdown. |
||||
- Report includes centrist support delta per mechanism. |
||||
- Top mechanism is identified with supporting evidence from motion titles. |
||||
|
||||
--- |
||||
|
||||
### U5. Update Findings Report |
||||
|
||||
**Goal:** Integrate dual-dimension correlation and mechanism analysis into the Overton findings report. |
||||
|
||||
**Requirements:** R6 |
||||
|
||||
**Dependencies:** U3, U4 |
||||
|
||||
**Files:** |
||||
- Modify: `reports/overton_window/findings_report.md` |
||||
|
||||
**Approach:** |
||||
1. Add a new Section 3b (or update Section 3 Content Extremity) with: |
||||
- Two-dimensional scoring results and correlation |
||||
- Whether the single-dimensional scores are confirmed or need revision |
||||
- Updated content extremity narrative with caveats refined by dual-dimension insight |
||||
2. Add a new Section 7 (Mechanism Analysis) with: |
||||
- Which mechanisms drove the centrist support surge |
||||
- Migration vs non-migration mechanism differences |
||||
3. Update Section 8 (Next Steps) to reflect completed 2D rescoring and mechanism work |
||||
|
||||
**Verification:** |
||||
- Report is internally consistent. |
||||
- New sections reference the right figures and tables. |
||||
- Next steps don't list work that's already done. |
||||
|
||||
--- |
||||
|
||||
## System-Wide Impact |
||||
|
||||
- **New DB tables:** `extremity_scores_2d`, `motion_mechanisms` — additive, no existing data modified. |
||||
- **New skill:** `.opencode/skills/score-extremity/SKILL.md` — no code impact, only prompt artifact. |
||||
- **No UI changes, no agent_tools changes, no pipeline changes.** |
||||
- **Tests:** New tests in `tests/right_wing/` do not affect existing test suite. |
||||
|
||||
--- |
||||
|
||||
## Risks & Dependencies |
||||
|
||||
| Risk | Mitigation | |
||||
|---|---| |
||||
| Subagent capacity limits (too many parallel dispatches) | Batch size 10 = 10 parallel subagents. Well within limits for 100 motions. If extending to 2,986, use hybrid approach (larger batches or fallback to API). | |
||||
| Subagent returns malformed JSON | Validator layer rejects and retries individual batches (max 2 retries). | |
||||
| Two dimensions correlate highly (r > 0.9) | Confirms the single-dimensional scores are directionally valid. Write this finding up as a confirmatory result — still valuable. | |
||||
| Mechanism taxonomy is too coarse to discriminate | Subagent derives from data, not pre-defined taxonomy. Iterative refinement in the subagent prompt if first pass is noisy. | |
||||
|
||||
--- |
||||
|
||||
## Sources & References |
||||
|
||||
- Origin plan: `docs/plans/2026-05-08-002-feat-overton-window-shift-plan.md` |
||||
- Findings report: `reports/overton_window/findings_report.md` |
||||
- Methodology doc: `docs/solutions/best-practices/overton-window-shift-methodology-2026-05-24.md` |
||||
- Existing scorer: `analysis/right_wing/extremity_scorer.py` |
||||
- Skill format reference: `~/.config/opencode/skills/ce-work/SKILL.md` |
||||
Loading…
Reference in new issue