You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-05-01-002-agent-native...

13 KiB

title type status date origin
Agent-Native Architecture Plan for Stemwijzer refactor active 2026-05-01 STRATEGY.md (agent-native architecture track)

Agent-Native Architecture Plan for Stemwijzer

Overview

Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns.

Current state: The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries.

Target state: An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved.


Problem Frame

  • Pipeline operators need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output.
  • Analysts/researchers want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code.
  • Developers need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually.
  • Content maintainers need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc.

Requirements Trace

  • R1. The agent can achieve anything a pipeline operator can achieve (parity)
  • R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability)
  • R3. The agent can diagnose pipeline health and suggest remediation (self-service operations)
  • R4. The agent can generate and validate content (SVD labels, motion summaries)
  • R5. New capabilities can be added by writing prompts, not code (composability)

Scope Boundaries

  • In scope: Agent primitives for data operations, pipeline control, analysis, and diagnostics
  • Deferred: Real-time agent UI inside Streamlit (future phase—add chat interface to explorer)
  • Deferred: Autonomous pipeline scheduling (scheduler.py exists but agent control is v2)
  • Not working on: Natural language to SQL for end users (this plan targets agent operators, not voter-facing features)

Key Technical Decisions

  • Files as universal interface: DuckDB is already file-based (data/motions.db). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads.
  • Database tools over file tools for structured data: For querying motions, votes, and embeddings, the agent needs query_database primitives that wrap DuckDB/SQL, not raw file operations.
  • Pipeline as state machine: The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything."
  • Shared workspace: Agent and human operate on the same data/motions.db, the same thoughts/explorer/ outputs, the same docs/solutions/ knowledge base.

Implementation Units

  • U1. Database query primitives

    • Goal: Give the agent structured access to the DuckDB database
    • Requirements: R1, R2, R4
    • Dependencies: None
    • Files:
      • Create: agent_tools/database.py
      • Test: tests/agent_tools/test_database_tools.py
    • Approach: Wrap DuckDB queries as atomic tools:
      • query_motions(filter, limit, order) → returns motion rows as JSON
      • query_votes(motion_id, party) → returns vote counts
      • query_svd_vectors(window_id, entity_type) → returns vectors
      • query_party_positions(window_id) → returns party axis scores
      • query_pipeline_status() → returns freshness metrics from health checks
    • Patterns to follow: health/checks.py already has DB query patterns; analysis/explorer_data.py has read-only query patterns
    • Test scenarios:
      • Happy path: query returns valid JSON for known filters
      • Edge case: empty result set returns [] not error
      • Error path: invalid SQL/filter returns structured error with suggestion
    • Verification: Agent can answer "How many motions in 2024?" using only the tool
  • U2. Pipeline control primitives

    • Goal: Let the agent run, monitor, and diagnose pipeline stages
    • Requirements: R1, R3
    • Dependencies: U1
    • Files:
      • Create: agent_tools/pipeline.py
      • Test: tests/agent_tools/test_pipeline_tools.py
    • Approach: Stage-aware pipeline tools:
      • pipeline_run_stage(stage, window_id, dry_run) → runs one stage, returns status
      • pipeline_run_full(dry_run) → orchestrates all stages with dependency ordering
      • pipeline_check_health() → returns health report (reuses health/ module)
      • pipeline_get_logs(stage, lines) → returns recent logs for a stage
      • pipeline_validate_output(stage) → checks output exists and looks reasonable
    • Patterns to follow: pipeline/run_pipeline.py has the stage orchestration; scripts/health_check.py has the CLI pattern
    • Test scenarios:
      • Happy path: dry-run returns planned actions without executing
      • Integration: running pipeline_run_stage("svd", "2024") produces expected svd_vectors rows
      • Error path: running a stage with missing dependencies returns clear error
    • Verification: Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run
  • U3. Analysis and report generation primitives

    • Goal: Let the agent perform analytical tasks and write reports
    • Requirements: R2, R4
    • Dependencies: U1
    • Files:
      • Create: agent_tools/analysis.py
      • Create: agent_tools/reports.py
      • Test: tests/agent_tools/test_analysis_tools.py
    • Approach:
      • analyze_party_shift(party, window_start, window_end, metric) → computes and returns shift data
      • analyze_axis_stability(component, windows) → returns stability scores
      • generate_report(type, parameters, output_path) → writes markdown report to reports/
      • validate_svd_labels(component) → compares theme labels to actual party positions
    • Patterns to follow: analysis/political_axis.py, scripts/motion_drift.py, scripts/validate_svd_themes.py
    • Test scenarios:
      • Happy path: analyze_party_shift returns structured data for known party
      • Integration: generate_report("drift", {windows: ["2020", "2024"]}) produces valid markdown
      • Edge case: requesting analysis for nonexistent window returns empty result
    • Verification: Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results
  • U4. Content validation primitives

    • Goal: Let the agent validate and suggest content improvements
    • Requirements: R4
    • Dependencies: U1, U3
    • Files:
      • Create: agent_tools/content.py
      • Test: tests/agent_tools/test_content_tools.py
    • Approach:
      • validate_motion_coverage(start_date, end_date) → returns coverage gaps
      • validate_layman_explanations(sample_size) → samples motions, checks explanation quality
      • suggest_svd_label(component, top_n_motions) → analyzes top motions, suggests label
      • check_embedding_quality(window_id) → returns coverage stats for fused embeddings
    • Patterns to follow: summarizer.py for explanation logic; scripts/validate_svd_themes.py for theme validation
    • Test scenarios:
      • Happy path: validate_motion_coverage returns accurate gap list
      • Edge case: all motions covered returns empty gaps
    • Verification: Agent can run weekly content quality checks and produce a report
  • U5. System prompt and context injection

    • Goal: Define agent behavior and inject runtime context
    • Requirements: R1, R2, R3, R4, R5
    • Dependencies: U1-U4
    • Files:
      • Create: agent_tools/SYSTEM_PROMPT.md
      • Create: agent_tools/context.py
    • Approach:
      • SYSTEM_PROMPT.md: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions
      • context.py: Injects runtime context—current pipeline status, latest SVD window, known issues from docs/solutions/, active party list
      • context.md pattern: Agent maintains agent_tools/context.md with accumulated learnings about the pipeline
    • Patterns to follow: ce-agent-native-architecture context.md pattern; AGENTS.md for project conventions
    • Test scenarios:
      • Context injection produces valid markdown with current DB stats
      • System prompt loads and parses without errors
    • Verification: Agent session starts with full context of pipeline state
  • U6. Agent-native testing and parity verification

    • Goal: Ensure agent can do everything humans can do
    • Requirements: R1
    • Dependencies: U1-U5
    • Files:
      • Create: tests/agent_tools/test_parity.py
      • Modify: tests/conftest.py (add agent tool fixtures)
    • Approach:
      • Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome
      • Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify)
      • test_parity.py: Matrix of human action → agent tool → expected outcome
    • Test scenarios:
      • Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result
      • Integration: Agent detects stale data, runs pipeline, verifies freshness
    • Verification: All parity tests pass

Output Structure

agent_tools/                    # New directory
├── __init__.py
├── SYSTEM_PROMPT.md            # Agent behavior definition
├── context.py                  # Runtime context injection
├── context.md                  # Accumulated agent knowledge
├── database.py                 # DB query primitives
├── pipeline.py                 # Pipeline control primitives
├── analysis.py                 # Analysis primitives
├── reports.py                  # Report generation
└── content.py                  # Content validation primitives

tests/agent_tools/              # New test directory
├── __init__.py
├── test_database_tools.py
├── test_pipeline_tools.py
├── test_analysis_tools.py
├── test_content_tools.py
└── test_parity.py

reports/                        # Agent-generated reports (gitignored)

System-Wide Impact

  • Interaction graph: Agent tools call into database.py, pipeline/, analysis/, health/ modules. These modules are already well-factored and read-only where appropriate.
  • Error propagation: Agent tools return structured errors (JSON with error, suggestion, retryable fields) rather than raising exceptions. This lets the agent reason about failures.
  • State lifecycle: Agent-generated reports in reports/ are ephemeral (gitignored). Agent updates to context.md are durable and committed.
  • Unchanged invariants: The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor.

Risks & Dependencies

Risk Mitigation
DuckDB concurrency (read-only agent + write pipeline) Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level.
Agent tools become stale as pipeline evolves Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift.
Context injection grows too large Context is scoped to the task. context.py generates minimal relevant context, not full DB dumps.
Security: agent has DB access Agent runs in the same trust boundary as the developer. No new security surface.

Documentation / Operational Notes

  • Add agent_tools/ to AGENTS.md so future agents know the capability surface exists
  • Document the parity test matrix in tests/agent_tools/README.md
  • reports/ should be gitignored; agent reports are ephemeral outputs

Sources & References

  • Origin: STRATEGY.md (agent-native architecture track)
  • Skill: ce-agent-native-architecture (parity, granularity, composability, emergent capability)
  • Related code: health/, pipeline/, analysis/, database.py
  • Related docs: docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md (P4 tracks)