--- title: Agent-Native Architecture Plan for Stemwijzer type: refactor status: active date: 2026-05-01 origin: STRATEGY.md (agent-native architecture track) --- # Agent-Native Architecture Plan for Stemwijzer ## Overview Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns. **Current state:** The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries. **Target state:** An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved. --- ## Problem Frame - **Pipeline operators** need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output. - **Analysts/researchers** want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code. - **Developers** need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually. - **Content maintainers** need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc. --- ## Requirements Trace - R1. The agent can achieve anything a pipeline operator can achieve (parity) - R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability) - R3. The agent can diagnose pipeline health and suggest remediation (self-service operations) - R4. The agent can generate and validate content (SVD labels, motion summaries) - R5. New capabilities can be added by writing prompts, not code (composability) --- ## Scope Boundaries - **In scope:** Agent primitives for data operations, pipeline control, analysis, and diagnostics - **Deferred:** Real-time agent UI inside Streamlit (future phase—add chat interface to explorer) - **Deferred:** Autonomous pipeline scheduling (scheduler.py exists but agent control is v2) - **Not working on:** Natural language to SQL for end users (this plan targets agent operators, not voter-facing features) --- ## Key Technical Decisions - **Files as universal interface:** DuckDB is already file-based (`data/motions.db`). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads. - **Database tools over file tools for structured data:** For querying motions, votes, and embeddings, the agent needs `query_database` primitives that wrap DuckDB/SQL, not raw file operations. - **Pipeline as state machine:** The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything." - **Shared workspace:** Agent and human operate on the same `data/motions.db`, the same `thoughts/explorer/` outputs, the same `docs/solutions/` knowledge base. --- ## Implementation Units - [ ] U1. **Database query primitives** - **Goal:** Give the agent structured access to the DuckDB database - **Requirements:** R1, R2, R4 - **Dependencies:** None - **Files:** - Create: `agent_tools/database.py` - Test: `tests/agent_tools/test_database_tools.py` - **Approach:** Wrap DuckDB queries as atomic tools: - `query_motions(filter, limit, order)` → returns motion rows as JSON - `query_votes(motion_id, party)` → returns vote counts - `query_svd_vectors(window_id, entity_type)` → returns vectors - `query_party_positions(window_id)` → returns party axis scores - `query_pipeline_status()` → returns freshness metrics from health checks - **Patterns to follow:** `health/checks.py` already has DB query patterns; `analysis/explorer_data.py` has read-only query patterns - **Test scenarios:** - Happy path: query returns valid JSON for known filters - Edge case: empty result set returns `[]` not error - Error path: invalid SQL/filter returns structured error with suggestion - **Verification:** Agent can answer "How many motions in 2024?" using only the tool - [ ] U2. **Pipeline control primitives** - **Goal:** Let the agent run, monitor, and diagnose pipeline stages - **Requirements:** R1, R3 - **Dependencies:** U1 - **Files:** - Create: `agent_tools/pipeline.py` - Test: `tests/agent_tools/test_pipeline_tools.py` - **Approach:** Stage-aware pipeline tools: - `pipeline_run_stage(stage, window_id, dry_run)` → runs one stage, returns status - `pipeline_run_full(dry_run)` → orchestrates all stages with dependency ordering - `pipeline_check_health()` → returns health report (reuses `health/` module) - `pipeline_get_logs(stage, lines)` → returns recent logs for a stage - `pipeline_validate_output(stage)` → checks output exists and looks reasonable - **Patterns to follow:** `pipeline/run_pipeline.py` has the stage orchestration; `scripts/health_check.py` has the CLI pattern - **Test scenarios:** - Happy path: dry-run returns planned actions without executing - Integration: running `pipeline_run_stage("svd", "2024")` produces expected `svd_vectors` rows - Error path: running a stage with missing dependencies returns clear error - **Verification:** Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run - [ ] U3. **Analysis and report generation primitives** - **Goal:** Let the agent perform analytical tasks and write reports - **Requirements:** R2, R4 - **Dependencies:** U1 - **Files:** - Create: `agent_tools/analysis.py` - Create: `agent_tools/reports.py` - Test: `tests/agent_tools/test_analysis_tools.py` - **Approach:** - `analyze_party_shift(party, window_start, window_end, metric)` → computes and returns shift data - `analyze_axis_stability(component, windows)` → returns stability scores - `generate_report(type, parameters, output_path)` → writes markdown report to `reports/` - `validate_svd_labels(component)` → compares theme labels to actual party positions - **Patterns to follow:** `analysis/political_axis.py`, `scripts/motion_drift.py`, `scripts/validate_svd_themes.py` - **Test scenarios:** - Happy path: `analyze_party_shift` returns structured data for known party - Integration: `generate_report("drift", {windows: ["2020", "2024"]})` produces valid markdown - Edge case: requesting analysis for nonexistent window returns empty result - **Verification:** Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results - [ ] U4. **Content validation primitives** - **Goal:** Let the agent validate and suggest content improvements - **Requirements:** R4 - **Dependencies:** U1, U3 - **Files:** - Create: `agent_tools/content.py` - Test: `tests/agent_tools/test_content_tools.py` - **Approach:** - `validate_motion_coverage(start_date, end_date)` → returns coverage gaps - `validate_layman_explanations(sample_size)` → samples motions, checks explanation quality - `suggest_svd_label(component, top_n_motions)` → analyzes top motions, suggests label - `check_embedding_quality(window_id)` → returns coverage stats for fused embeddings - **Patterns to follow:** `summarizer.py` for explanation logic; `scripts/validate_svd_themes.py` for theme validation - **Test scenarios:** - Happy path: `validate_motion_coverage` returns accurate gap list - Edge case: all motions covered returns empty gaps - **Verification:** Agent can run weekly content quality checks and produce a report - [ ] U5. **System prompt and context injection** - **Goal:** Define agent behavior and inject runtime context - **Requirements:** R1, R2, R3, R4, R5 - **Dependencies:** U1-U4 - **Files:** - Create: `agent_tools/SYSTEM_PROMPT.md` - Create: `agent_tools/context.py` - **Approach:** - `SYSTEM_PROMPT.md`: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions - `context.py`: Injects runtime context—current pipeline status, latest SVD window, known issues from `docs/solutions/`, active party list - `context.md` pattern: Agent maintains `agent_tools/context.md` with accumulated learnings about the pipeline - **Patterns to follow:** `ce-agent-native-architecture` context.md pattern; `AGENTS.md` for project conventions - **Test scenarios:** - Context injection produces valid markdown with current DB stats - System prompt loads and parses without errors - **Verification:** Agent session starts with full context of pipeline state - [ ] U6. **Agent-native testing and parity verification** - **Goal:** Ensure agent can do everything humans can do - **Requirements:** R1 - **Dependencies:** U1-U5 - **Files:** - Create: `tests/agent_tools/test_parity.py` - Modify: `tests/conftest.py` (add agent tool fixtures) - **Approach:** - Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome - Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify) - `test_parity.py`: Matrix of human action → agent tool → expected outcome - **Test scenarios:** - Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result - Integration: Agent detects stale data, runs pipeline, verifies freshness - **Verification:** All parity tests pass --- ## Output Structure ``` agent_tools/ # New directory ├── __init__.py ├── SYSTEM_PROMPT.md # Agent behavior definition ├── context.py # Runtime context injection ├── context.md # Accumulated agent knowledge ├── database.py # DB query primitives ├── pipeline.py # Pipeline control primitives ├── analysis.py # Analysis primitives ├── reports.py # Report generation └── content.py # Content validation primitives tests/agent_tools/ # New test directory ├── __init__.py ├── test_database_tools.py ├── test_pipeline_tools.py ├── test_analysis_tools.py ├── test_content_tools.py └── test_parity.py reports/ # Agent-generated reports (gitignored) ``` --- ## System-Wide Impact - **Interaction graph:** Agent tools call into `database.py`, `pipeline/`, `analysis/`, `health/` modules. These modules are already well-factored and read-only where appropriate. - **Error propagation:** Agent tools return structured errors (JSON with `error`, `suggestion`, `retryable` fields) rather than raising exceptions. This lets the agent reason about failures. - **State lifecycle:** Agent-generated reports in `reports/` are ephemeral (gitignored). Agent updates to `context.md` are durable and committed. - **Unchanged invariants:** The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor. --- ## Risks & Dependencies | Risk | Mitigation | |------|-----------| | DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. | | Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. | | Context injection grows too large | Context is scoped to the task. `context.py` generates minimal relevant context, not full DB dumps. | | Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. | --- ## Documentation / Operational Notes - Add `agent_tools/` to `AGENTS.md` so future agents know the capability surface exists - Document the parity test matrix in `tests/agent_tools/README.md` - `reports/` should be gitignored; agent reports are ephemeral outputs --- ## Sources & References - **Origin:** STRATEGY.md (agent-native architecture track) - **Skill:** `ce-agent-native-architecture` (parity, granularity, composability, emergent capability) - **Related code:** `health/`, `pipeline/`, `analysis/`, `database.py` - **Related docs:** `docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md` (P4 tracks)