You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
233 lines
13 KiB
233 lines
13 KiB
---
|
|
title: Agent-Native Architecture Plan for Stemwijzer
|
|
type: refactor
|
|
status: active
|
|
date: 2026-05-01
|
|
origin: STRATEGY.md (agent-native architecture track)
|
|
---
|
|
|
|
# Agent-Native Architecture Plan for Stemwijzer
|
|
|
|
## Overview
|
|
|
|
Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns.
|
|
|
|
**Current state:** The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries.
|
|
|
|
**Target state:** An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved.
|
|
|
|
---
|
|
|
|
## Problem Frame
|
|
|
|
- **Pipeline operators** need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output.
|
|
- **Analysts/researchers** want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code.
|
|
- **Developers** need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually.
|
|
- **Content maintainers** need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc.
|
|
|
|
---
|
|
|
|
## Requirements Trace
|
|
|
|
- R1. The agent can achieve anything a pipeline operator can achieve (parity)
|
|
- R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability)
|
|
- R3. The agent can diagnose pipeline health and suggest remediation (self-service operations)
|
|
- R4. The agent can generate and validate content (SVD labels, motion summaries)
|
|
- R5. New capabilities can be added by writing prompts, not code (composability)
|
|
|
|
---
|
|
|
|
## Scope Boundaries
|
|
|
|
- **In scope:** Agent primitives for data operations, pipeline control, analysis, and diagnostics
|
|
- **Deferred:** Real-time agent UI inside Streamlit (future phase—add chat interface to explorer)
|
|
- **Deferred:** Autonomous pipeline scheduling (scheduler.py exists but agent control is v2)
|
|
- **Not working on:** Natural language to SQL for end users (this plan targets agent operators, not voter-facing features)
|
|
|
|
---
|
|
|
|
## Key Technical Decisions
|
|
|
|
- **Files as universal interface:** DuckDB is already file-based (`data/motions.db`). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads.
|
|
- **Database tools over file tools for structured data:** For querying motions, votes, and embeddings, the agent needs `query_database` primitives that wrap DuckDB/SQL, not raw file operations.
|
|
- **Pipeline as state machine:** The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything."
|
|
- **Shared workspace:** Agent and human operate on the same `data/motions.db`, the same `thoughts/explorer/` outputs, the same `docs/solutions/` knowledge base.
|
|
|
|
---
|
|
|
|
## Implementation Units
|
|
|
|
- [ ] U1. **Database query primitives**
|
|
- **Goal:** Give the agent structured access to the DuckDB database
|
|
- **Requirements:** R1, R2, R4
|
|
- **Dependencies:** None
|
|
- **Files:**
|
|
- Create: `agent_tools/database.py`
|
|
- Test: `tests/agent_tools/test_database_tools.py`
|
|
- **Approach:** Wrap DuckDB queries as atomic tools:
|
|
- `query_motions(filter, limit, order)` → returns motion rows as JSON
|
|
- `query_votes(motion_id, party)` → returns vote counts
|
|
- `query_svd_vectors(window_id, entity_type)` → returns vectors
|
|
- `query_party_positions(window_id)` → returns party axis scores
|
|
- `query_pipeline_status()` → returns freshness metrics from health checks
|
|
- **Patterns to follow:** `health/checks.py` already has DB query patterns; `analysis/explorer_data.py` has read-only query patterns
|
|
- **Test scenarios:**
|
|
- Happy path: query returns valid JSON for known filters
|
|
- Edge case: empty result set returns `[]` not error
|
|
- Error path: invalid SQL/filter returns structured error with suggestion
|
|
- **Verification:** Agent can answer "How many motions in 2024?" using only the tool
|
|
|
|
- [ ] U2. **Pipeline control primitives**
|
|
- **Goal:** Let the agent run, monitor, and diagnose pipeline stages
|
|
- **Requirements:** R1, R3
|
|
- **Dependencies:** U1
|
|
- **Files:**
|
|
- Create: `agent_tools/pipeline.py`
|
|
- Test: `tests/agent_tools/test_pipeline_tools.py`
|
|
- **Approach:** Stage-aware pipeline tools:
|
|
- `pipeline_run_stage(stage, window_id, dry_run)` → runs one stage, returns status
|
|
- `pipeline_run_full(dry_run)` → orchestrates all stages with dependency ordering
|
|
- `pipeline_check_health()` → returns health report (reuses `health/` module)
|
|
- `pipeline_get_logs(stage, lines)` → returns recent logs for a stage
|
|
- `pipeline_validate_output(stage)` → checks output exists and looks reasonable
|
|
- **Patterns to follow:** `pipeline/run_pipeline.py` has the stage orchestration; `scripts/health_check.py` has the CLI pattern
|
|
- **Test scenarios:**
|
|
- Happy path: dry-run returns planned actions without executing
|
|
- Integration: running `pipeline_run_stage("svd", "2024")` produces expected `svd_vectors` rows
|
|
- Error path: running a stage with missing dependencies returns clear error
|
|
- **Verification:** Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run
|
|
|
|
- [ ] U3. **Analysis and report generation primitives**
|
|
- **Goal:** Let the agent perform analytical tasks and write reports
|
|
- **Requirements:** R2, R4
|
|
- **Dependencies:** U1
|
|
- **Files:**
|
|
- Create: `agent_tools/analysis.py`
|
|
- Create: `agent_tools/reports.py`
|
|
- Test: `tests/agent_tools/test_analysis_tools.py`
|
|
- **Approach:**
|
|
- `analyze_party_shift(party, window_start, window_end, metric)` → computes and returns shift data
|
|
- `analyze_axis_stability(component, windows)` → returns stability scores
|
|
- `generate_report(type, parameters, output_path)` → writes markdown report to `reports/`
|
|
- `validate_svd_labels(component)` → compares theme labels to actual party positions
|
|
- **Patterns to follow:** `analysis/political_axis.py`, `scripts/motion_drift.py`, `scripts/validate_svd_themes.py`
|
|
- **Test scenarios:**
|
|
- Happy path: `analyze_party_shift` returns structured data for known party
|
|
- Integration: `generate_report("drift", {windows: ["2020", "2024"]})` produces valid markdown
|
|
- Edge case: requesting analysis for nonexistent window returns empty result
|
|
- **Verification:** Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results
|
|
|
|
- [ ] U4. **Content validation primitives**
|
|
- **Goal:** Let the agent validate and suggest content improvements
|
|
- **Requirements:** R4
|
|
- **Dependencies:** U1, U3
|
|
- **Files:**
|
|
- Create: `agent_tools/content.py`
|
|
- Test: `tests/agent_tools/test_content_tools.py`
|
|
- **Approach:**
|
|
- `validate_motion_coverage(start_date, end_date)` → returns coverage gaps
|
|
- `validate_layman_explanations(sample_size)` → samples motions, checks explanation quality
|
|
- `suggest_svd_label(component, top_n_motions)` → analyzes top motions, suggests label
|
|
- `check_embedding_quality(window_id)` → returns coverage stats for fused embeddings
|
|
- **Patterns to follow:** `summarizer.py` for explanation logic; `scripts/validate_svd_themes.py` for theme validation
|
|
- **Test scenarios:**
|
|
- Happy path: `validate_motion_coverage` returns accurate gap list
|
|
- Edge case: all motions covered returns empty gaps
|
|
- **Verification:** Agent can run weekly content quality checks and produce a report
|
|
|
|
- [ ] U5. **System prompt and context injection**
|
|
- **Goal:** Define agent behavior and inject runtime context
|
|
- **Requirements:** R1, R2, R3, R4, R5
|
|
- **Dependencies:** U1-U4
|
|
- **Files:**
|
|
- Create: `agent_tools/SYSTEM_PROMPT.md`
|
|
- Create: `agent_tools/context.py`
|
|
- **Approach:**
|
|
- `SYSTEM_PROMPT.md`: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions
|
|
- `context.py`: Injects runtime context—current pipeline status, latest SVD window, known issues from `docs/solutions/`, active party list
|
|
- `context.md` pattern: Agent maintains `agent_tools/context.md` with accumulated learnings about the pipeline
|
|
- **Patterns to follow:** `ce-agent-native-architecture` context.md pattern; `AGENTS.md` for project conventions
|
|
- **Test scenarios:**
|
|
- Context injection produces valid markdown with current DB stats
|
|
- System prompt loads and parses without errors
|
|
- **Verification:** Agent session starts with full context of pipeline state
|
|
|
|
- [ ] U6. **Agent-native testing and parity verification**
|
|
- **Goal:** Ensure agent can do everything humans can do
|
|
- **Requirements:** R1
|
|
- **Dependencies:** U1-U5
|
|
- **Files:**
|
|
- Create: `tests/agent_tools/test_parity.py`
|
|
- Modify: `tests/conftest.py` (add agent tool fixtures)
|
|
- **Approach:**
|
|
- Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome
|
|
- Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify)
|
|
- `test_parity.py`: Matrix of human action → agent tool → expected outcome
|
|
- **Test scenarios:**
|
|
- Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result
|
|
- Integration: Agent detects stale data, runs pipeline, verifies freshness
|
|
- **Verification:** All parity tests pass
|
|
|
|
---
|
|
|
|
## Output Structure
|
|
|
|
```
|
|
agent_tools/ # New directory
|
|
├── __init__.py
|
|
├── SYSTEM_PROMPT.md # Agent behavior definition
|
|
├── context.py # Runtime context injection
|
|
├── context.md # Accumulated agent knowledge
|
|
├── database.py # DB query primitives
|
|
├── pipeline.py # Pipeline control primitives
|
|
├── analysis.py # Analysis primitives
|
|
├── reports.py # Report generation
|
|
└── content.py # Content validation primitives
|
|
|
|
tests/agent_tools/ # New test directory
|
|
├── __init__.py
|
|
├── test_database_tools.py
|
|
├── test_pipeline_tools.py
|
|
├── test_analysis_tools.py
|
|
├── test_content_tools.py
|
|
└── test_parity.py
|
|
|
|
reports/ # Agent-generated reports (gitignored)
|
|
```
|
|
|
|
---
|
|
|
|
## System-Wide Impact
|
|
|
|
- **Interaction graph:** Agent tools call into `database.py`, `pipeline/`, `analysis/`, `health/` modules. These modules are already well-factored and read-only where appropriate.
|
|
- **Error propagation:** Agent tools return structured errors (JSON with `error`, `suggestion`, `retryable` fields) rather than raising exceptions. This lets the agent reason about failures.
|
|
- **State lifecycle:** Agent-generated reports in `reports/` are ephemeral (gitignored). Agent updates to `context.md` are durable and committed.
|
|
- **Unchanged invariants:** The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor.
|
|
|
|
---
|
|
|
|
## Risks & Dependencies
|
|
|
|
| Risk | Mitigation |
|
|
|------|-----------|
|
|
| DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. |
|
|
| Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. |
|
|
| Context injection grows too large | Context is scoped to the task. `context.py` generates minimal relevant context, not full DB dumps. |
|
|
| Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. |
|
|
|
|
---
|
|
|
|
## Documentation / Operational Notes
|
|
|
|
- Add `agent_tools/` to `AGENTS.md` so future agents know the capability surface exists
|
|
- Document the parity test matrix in `tests/agent_tools/README.md`
|
|
- `reports/` should be gitignored; agent reports are ephemeral outputs
|
|
|
|
---
|
|
|
|
## Sources & References
|
|
|
|
- **Origin:** STRATEGY.md (agent-native architecture track)
|
|
- **Skill:** `ce-agent-native-architecture` (parity, granularity, composability, emergent capability)
|
|
- **Related code:** `health/`, `pipeline/`, `analysis/`, `database.py`
|
|
- **Related docs:** `docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md` (P4 tracks)
|
|
|