13 KiB
| title | type | status | date | origin |
|---|---|---|---|---|
| Agent-Native Architecture Plan for Stemwijzer | refactor | active | 2026-05-01 | STRATEGY.md (agent-native architecture track) |
Agent-Native Architecture Plan for Stemwijzer
Overview
Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns.
Current state: The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries.
Target state: An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved.
Problem Frame
- Pipeline operators need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output.
- Analysts/researchers want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code.
- Developers need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually.
- Content maintainers need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc.
Requirements Trace
- R1. The agent can achieve anything a pipeline operator can achieve (parity)
- R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability)
- R3. The agent can diagnose pipeline health and suggest remediation (self-service operations)
- R4. The agent can generate and validate content (SVD labels, motion summaries)
- R5. New capabilities can be added by writing prompts, not code (composability)
Scope Boundaries
- In scope: Agent primitives for data operations, pipeline control, analysis, and diagnostics
- Deferred: Real-time agent UI inside Streamlit (future phase—add chat interface to explorer)
- Deferred: Autonomous pipeline scheduling (scheduler.py exists but agent control is v2)
- Not working on: Natural language to SQL for end users (this plan targets agent operators, not voter-facing features)
Key Technical Decisions
- Files as universal interface: DuckDB is already file-based (
data/motions.db). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads. - Database tools over file tools for structured data: For querying motions, votes, and embeddings, the agent needs
query_databaseprimitives that wrap DuckDB/SQL, not raw file operations. - Pipeline as state machine: The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything."
- Shared workspace: Agent and human operate on the same
data/motions.db, the samethoughts/explorer/outputs, the samedocs/solutions/knowledge base.
Implementation Units
-
U1. Database query primitives
- Goal: Give the agent structured access to the DuckDB database
- Requirements: R1, R2, R4
- Dependencies: None
- Files:
- Create:
agent_tools/database.py - Test:
tests/agent_tools/test_database_tools.py
- Create:
- Approach: Wrap DuckDB queries as atomic tools:
query_motions(filter, limit, order)→ returns motion rows as JSONquery_votes(motion_id, party)→ returns vote countsquery_svd_vectors(window_id, entity_type)→ returns vectorsquery_party_positions(window_id)→ returns party axis scoresquery_pipeline_status()→ returns freshness metrics from health checks
- Patterns to follow:
health/checks.pyalready has DB query patterns;analysis/explorer_data.pyhas read-only query patterns - Test scenarios:
- Happy path: query returns valid JSON for known filters
- Edge case: empty result set returns
[]not error - Error path: invalid SQL/filter returns structured error with suggestion
- Verification: Agent can answer "How many motions in 2024?" using only the tool
-
U2. Pipeline control primitives
- Goal: Let the agent run, monitor, and diagnose pipeline stages
- Requirements: R1, R3
- Dependencies: U1
- Files:
- Create:
agent_tools/pipeline.py - Test:
tests/agent_tools/test_pipeline_tools.py
- Create:
- Approach: Stage-aware pipeline tools:
pipeline_run_stage(stage, window_id, dry_run)→ runs one stage, returns statuspipeline_run_full(dry_run)→ orchestrates all stages with dependency orderingpipeline_check_health()→ returns health report (reuseshealth/module)pipeline_get_logs(stage, lines)→ returns recent logs for a stagepipeline_validate_output(stage)→ checks output exists and looks reasonable
- Patterns to follow:
pipeline/run_pipeline.pyhas the stage orchestration;scripts/health_check.pyhas the CLI pattern - Test scenarios:
- Happy path: dry-run returns planned actions without executing
- Integration: running
pipeline_run_stage("svd", "2024")produces expectedsvd_vectorsrows - Error path: running a stage with missing dependencies returns clear error
- Verification: Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run
-
U3. Analysis and report generation primitives
- Goal: Let the agent perform analytical tasks and write reports
- Requirements: R2, R4
- Dependencies: U1
- Files:
- Create:
agent_tools/analysis.py - Create:
agent_tools/reports.py - Test:
tests/agent_tools/test_analysis_tools.py
- Create:
- Approach:
analyze_party_shift(party, window_start, window_end, metric)→ computes and returns shift dataanalyze_axis_stability(component, windows)→ returns stability scoresgenerate_report(type, parameters, output_path)→ writes markdown report toreports/validate_svd_labels(component)→ compares theme labels to actual party positions
- Patterns to follow:
analysis/political_axis.py,scripts/motion_drift.py,scripts/validate_svd_themes.py - Test scenarios:
- Happy path:
analyze_party_shiftreturns structured data for known party - Integration:
generate_report("drift", {windows: ["2020", "2024"]})produces valid markdown - Edge case: requesting analysis for nonexistent window returns empty result
- Happy path:
- Verification: Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results
-
U4. Content validation primitives
- Goal: Let the agent validate and suggest content improvements
- Requirements: R4
- Dependencies: U1, U3
- Files:
- Create:
agent_tools/content.py - Test:
tests/agent_tools/test_content_tools.py
- Create:
- Approach:
validate_motion_coverage(start_date, end_date)→ returns coverage gapsvalidate_layman_explanations(sample_size)→ samples motions, checks explanation qualitysuggest_svd_label(component, top_n_motions)→ analyzes top motions, suggests labelcheck_embedding_quality(window_id)→ returns coverage stats for fused embeddings
- Patterns to follow:
summarizer.pyfor explanation logic;scripts/validate_svd_themes.pyfor theme validation - Test scenarios:
- Happy path:
validate_motion_coveragereturns accurate gap list - Edge case: all motions covered returns empty gaps
- Happy path:
- Verification: Agent can run weekly content quality checks and produce a report
-
U5. System prompt and context injection
- Goal: Define agent behavior and inject runtime context
- Requirements: R1, R2, R3, R4, R5
- Dependencies: U1-U4
- Files:
- Create:
agent_tools/SYSTEM_PROMPT.md - Create:
agent_tools/context.py
- Create:
- Approach:
SYSTEM_PROMPT.md: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventionscontext.py: Injects runtime context—current pipeline status, latest SVD window, known issues fromdocs/solutions/, active party listcontext.mdpattern: Agent maintainsagent_tools/context.mdwith accumulated learnings about the pipeline
- Patterns to follow:
ce-agent-native-architecturecontext.md pattern;AGENTS.mdfor project conventions - Test scenarios:
- Context injection produces valid markdown with current DB stats
- System prompt loads and parses without errors
- Verification: Agent session starts with full context of pipeline state
-
U6. Agent-native testing and parity verification
- Goal: Ensure agent can do everything humans can do
- Requirements: R1
- Dependencies: U1-U5
- Files:
- Create:
tests/agent_tools/test_parity.py - Modify:
tests/conftest.py(add agent tool fixtures)
- Create:
- Approach:
- Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome
- Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify)
test_parity.py: Matrix of human action → agent tool → expected outcome
- Test scenarios:
- Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result
- Integration: Agent detects stale data, runs pipeline, verifies freshness
- Verification: All parity tests pass
Output Structure
agent_tools/ # New directory
├── __init__.py
├── SYSTEM_PROMPT.md # Agent behavior definition
├── context.py # Runtime context injection
├── context.md # Accumulated agent knowledge
├── database.py # DB query primitives
├── pipeline.py # Pipeline control primitives
├── analysis.py # Analysis primitives
├── reports.py # Report generation
└── content.py # Content validation primitives
tests/agent_tools/ # New test directory
├── __init__.py
├── test_database_tools.py
├── test_pipeline_tools.py
├── test_analysis_tools.py
├── test_content_tools.py
└── test_parity.py
reports/ # Agent-generated reports (gitignored)
System-Wide Impact
- Interaction graph: Agent tools call into
database.py,pipeline/,analysis/,health/modules. These modules are already well-factored and read-only where appropriate. - Error propagation: Agent tools return structured errors (JSON with
error,suggestion,retryablefields) rather than raising exceptions. This lets the agent reason about failures. - State lifecycle: Agent-generated reports in
reports/are ephemeral (gitignored). Agent updates tocontext.mdare durable and committed. - Unchanged invariants: The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor.
Risks & Dependencies
| Risk | Mitigation |
|---|---|
| DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. |
| Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. |
| Context injection grows too large | Context is scoped to the task. context.py generates minimal relevant context, not full DB dumps. |
| Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. |
Documentation / Operational Notes
- Add
agent_tools/toAGENTS.mdso future agents know the capability surface exists - Document the parity test matrix in
tests/agent_tools/README.md reports/should be gitignored; agent reports are ephemeral outputs
Sources & References
- Origin: STRATEGY.md (agent-native architecture track)
- Skill:
ce-agent-native-architecture(parity, granularity, composability, emergent capability) - Related code:
health/,pipeline/,analysis/,database.py - Related docs:
docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md(P4 tracks)