feat: implement agent-native architecture (U1-U6)

Implements the agent-native architecture plan (docs/plans/2026-05-01-002-agent-native-architecture-plan.md): - U1: Database query primitives (agent_tools/database.py) - query_motions, query_votes, query_svd_vectors, query_party_positions, query_pipeline_status - U2: Pipeline control primitives (agent_tools/pipeline.py) - pipeline_run_stage, pipeline_run_full, pipeline_check_health, pipeline_get_logs, pipeline_validate_output - U3: Analysis & report generation (agent_tools/analysis.py, reports.py) - analyze_party_shift, analyze_axis_stability, validate_svd_labels, generate_report - U4: Content validation primitives (agent_tools/content.py) - validate_motion_coverage, validate_layman_explanations, suggest_svd_label, check_embedding_quality - U5: System prompt & context injection (SYSTEM_PROMPT.md, context.py, context.md) - U6: Parity verification tests (tests/agent_tools/test_parity.py) Tests: 238 passed, 2 skipped AGENTS.md updated to surface agent_tools/
3 months ago · 8af27bbf04
parent 98358344a0
commit 8af27bbf04
17 changed files with 1776 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -17,6 +17,7 @@ data/*.json
 # Generated output files
 outputs/
 outputs_*/
 reports/
 # Stray temp files
 dummy
--- a/AGENTS.md
+++ b/AGENTS.md
@ -8,6 +8,10 @@
 - Git is hosted on a **Gitea** server, not GitHub directly. The `gh` CLI is not available for this repo; use standard `git` commands instead.
 ## Agent Tools
 `agent_tools/` — atomic primitives that let an agent operate the Stemwijzer pipeline, database, and analysis surface. The agent-native architecture track (see STRATEGY.md) exposes every human operator capability through these tools. Relevant when extending agent capabilities or debugging tool behavior.
 ## Project Conventions
 - Right-wing parties (PVV, FVD, JA21, SGP) must appear on the RIGHT side of all axes in visualizations
--- a/agent_tools/SYSTEM_PROMPT.md
+++ b/agent_tools/SYSTEM_PROMPT.md
@ -0,0 +1,81 @@
 # Stemwijzer Agent System Prompt
 You are the **Stemwijzer Pipeline Operator** — an autonomous agent that operates the Stemwijzer parliamentary voting analysis pipeline.
 ## Your Identity
 - You are methodical, precise, and data-driven.
 - You prefer structured outputs (JSON, markdown tables) over prose.
 - You always verify assumptions with data before making claims.
 - You write reports to `reports/` and accumulate learnings in `agent_tools/context.md`.
 ## Your Capabilities
 You have access to these atomic tools:
 ### Database Queries (`agent_tools.database`)
 - `query_motions(db_path, year, policy_area, limit)` — Query motions with filters
 - `query_votes(db_path, motion_id, party)` — Query votes for a motion
 - `query_svd_vectors(db_path, window_id, entity_type)` — Query SVD vectors
 - `query_party_positions(db_path, window_id)` — Query party axis scores
 - `query_pipeline_status(db_path)` — Get pipeline freshness metrics
 ### Pipeline Control (`agent_tools.pipeline`)
 - `pipeline_run_stage(db_path, stage, window_id, dry_run)` — Run one pipeline stage
 - `pipeline_run_full(db_path, dry_run)` — Run all stages
 - `pipeline_check_health(db_path)` — Check pipeline health
 - `pipeline_get_logs(db_path, stage, lines)` — Get recent logs
 - `pipeline_validate_output(db_path, stage)` — Validate stage output
 ### Analysis (`agent_tools.analysis`)
 - `analyze_party_shift(db_path, party, window_start, window_end)` — Track party movement
 - `analyze_axis_stability(db_path, component, windows)` — Measure axis consistency
 - `validate_svd_labels(db_path, component)` — Check labels match positions
 ### Reports (`agent_tools.reports`)
 - `generate_report(db_path, report_type, parameters, output_path)` — Write markdown reports
 ### Content Validation (`agent_tools.content`)
 - `validate_motion_coverage(db_path, start_date, end_date)` — Find data gaps
 - `validate_layman_explanations(db_path, sample_size)` — Check explanation quality
 - `suggest_svd_label(db_path, component, top_n)` — Analyze top motions for labels
 - `check_embedding_quality(db_path, window_id)` — Measure embedding coverage
 ## Decision Criteria
 ### When to run the pipeline
 - Data is stale (> 7 days since last motion)
 - Health checks show `healthy: false`
 - User explicitly requests fresh data
 ### When to generate a report
 - User asks for analysis that spans multiple queries
 - Health check reveals issues that need documentation
 - Weekly/bi-weekly operational reviews
 ### When to validate content
 - After pipeline runs (automated quality gate)
 - When SVD labels look suspicious
 - Before publishing analysis to users
 ## Output Conventions
 1. **Always return structured data** — dicts and lists, not raw prose
 2. **Include `error` keys** when things fail, with actionable suggestions
 3. **Write reports to `reports/`** — ephemeral, human-readable artifacts
 4. **Update `context.md`** when you learn something about the pipeline
 5. **Be explicit about uncertainty** — "Data shows X (n=123)" not "Probably X"
 ## Knowledge Base
 Before making claims about the data, check `docs/solutions/` for documented patterns:
 - SVD labels reflect voting patterns, not semantic content
 - Right-wing parties appear on the RIGHT side of all axes
 - EVR percentages come from `analysis.political_axis.compute_svd_spectrum`
 ## Safety
 - You operate in the same trust boundary as the developer
 - You can read the full database but write only to `reports/` and `context.md`
 - You cannot delete data or modify pipeline logic
 - Always use dry_run=True when the user says "what would happen if..."
--- a/agent_tools/init.py
+++ b/agent_tools/init.py
@ -0,0 +1 @@
 """Agent tools for Stemwijzer — atomic primitives for agent operation."""
--- a/agent_tools/analysis.py
+++ b/agent_tools/analysis.py
@ -0,0 +1,170 @@
 """Analysis primitives for agent operation.
 High-level analytical tools that compose database queries with
 statistical computation to answer research questions.
 """
 from __future__ import annotations
 import json
 import logging
 from typing import Any, Dict, List, Optional
 from agent_tools.database import query_party_positions, query_svd_vectors
 logger = logging.getLogger(__name__)
 def analyze_party_shift(
    db_path: str,
    party: str,
    window_start: str,
    window_end: str,
    metric: str = "euclidean",
 ) -> Dict[str, Any]:
    """Analyze how a party's position shifted between two windows."""
    try:
        start_pos = query_party_positions(db_path, window_start)
        end_pos = query_party_positions(db_path, window_end)
        start = next((p for p in start_pos if p.get("party") == party), None)
        end = next((p for p in end_pos if p.get("party") == party), None)
        if not start or not end:
            return {
                "party": party,
                "window_start": window_start,
                "window_end": window_end,
                "error": f"Party '{party}' not found in one or both windows",
            }
        # Compute Euclidean distance on first 2 axes
        dx = end.get("axis_1", 0.0) - start.get("axis_1", 0.0)
        dy = end.get("axis_2", 0.0) - start.get("axis_2", 0.0)
        shift = (dx ** 2 + dy ** 2) ** 0.5
        return {
            "party": party,
            "window_start": window_start,
            "window_end": window_end,
            "shift": round(shift, 4),
            "start_position": {"axis_1": start.get("axis_1"), "axis_2": start.get("axis_2")},
            "end_position": {"axis_1": end.get("axis_1"), "axis_2": end.get("axis_2")},
            "direction": {"dx": round(dx, 4), "dy": round(dy, 4)},
        }
    except Exception as e:
        logger.exception("analyze_party_shift failed")
        return {"party": party, "error": str(e)}
 def analyze_axis_stability(
    db_path: str,
    component: int,
    windows: List[str],
 ) -> Dict[str, Any]:
    """Analyze stability of an SVD component across windows.
    Returns cosine similarity between the component vector in consecutive windows.
    """
    try:
        vectors_by_window = {}
        for window in windows:
            rows = query_svd_vectors(db_path, window, entity_type="motion")
            if rows:
                vectors_by_window[window] = rows
        if len(vectors_by_window) < 2:
            return {
                "component": component,
                "windows": windows,
                "error": "Need at least 2 windows with SVD vectors",
            }
        # Extract component scores for each window
        # (component is 1-indexed in user-facing code, 0-indexed internally)
        idx = component - 1
        window_scores = {}
        for window, rows in vectors_by_window.items():
            scores = []
            for row in rows:
                vec = row.get("vector")
                if isinstance(vec, str):
                    vec = json.loads(vec)
                if isinstance(vec, list) and idx < len(vec):
                    scores.append(vec[idx])
            window_scores[window] = scores
        # Compute pairwise correlations between consecutive windows
        import numpy as np
        stability_scores = []
        window_list = sorted(window_scores.keys())
        for i in range(len(window_list) - 1):
            w1, w2 = window_list[i], window_list[i + 1]
            s1, s2 = window_scores[w1], window_scores[w2]
            if len(s1) == len(s2) and len(s1) > 1:
                corr = np.corrcoef(s1, s2)[0, 1]
                stability_scores.append({
                    "from_window": w1,
                    "to_window": w2,
                    "correlation": round(float(corr), 4),
                })
        avg_stability = (
            sum(s["correlation"] for s in stability_scores) / len(stability_scores)
            if stability_scores else 0.0
        )
        return {
            "component": component,
            "windows": windows,
            "stability": round(avg_stability, 4),
            "pairwise": stability_scores,
        }
    except Exception as e:
        logger.exception("analyze_axis_stability failed")
        return {"component": component, "error": str(e)}
 def validate_svd_labels(
    db_path: str,
    component: int,
 ) -> Dict[str, Any]:
    """Validate SVD theme labels against actual party positions.
    Checks whether the top positive/negative parties on a component
    align with the theme label from analysis/config.py.
    """
    try:
        from analysis.config import SVD_THEMES
        theme = SVD_THEMES.get(component, {})
        label = theme.get("label", "Unknown")
        description = theme.get("description", "")
        # Get current parliament positions for all parties
        positions = query_party_positions(db_path, "current_parliament")
        if not positions:
            return {
                "component": component,
                "label": label,
                "valid": False,
                "error": "No party positions found",
            }
        # Sort by axis_1 (the component's primary direction)
        sorted_parties = sorted(positions, key=lambda p: p.get("axis_1", 0.0))
        negative_pole = sorted_parties[:3] if len(sorted_parties) >= 3 else sorted_parties[:1]
        positive_pole = sorted_parties[-3:] if len(sorted_parties) >= 3 else sorted_parties[-1:]
        return {
            "component": component,
            "label": label,
            "description": description,
            "valid": True,
            "negative_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in negative_pole],
            "positive_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in positive_pole],
        }
    except Exception as e:
        logger.exception("validate_svd_labels failed")
        return {"component": component, "valid": False, "error": str(e)}
--- a/agent_tools/content.py
+++ b/agent_tools/content.py
@ -0,0 +1,183 @@
 """Content validation primitives for agent operation.
 Tools for validating data quality, coverage, and content correctness.
 """
 from __future__ import annotations
 import logging
 from datetime import datetime, timedelta
 from typing import Any, Dict, List, Optional
 from agent_tools.database import query_motions, query_svd_vectors
 logger = logging.getLogger(__name__)
 def validate_motion_coverage(
    db_path: str,
    start_date: str,
    end_date: str,
 ) -> Dict[str, Any]:
    """Validate motion coverage for a date range.
    Returns gaps where no motions exist in the database.
    """
    try:
        motions = query_motions(db_path, limit=10000)
        if not motions:
            return {
                "gaps": [{"start": start_date, "end": end_date}],
                "coverage_rate": 0.0,
                "total_motions": 0,
            }
        # Convert dates
        start = datetime.fromisoformat(start_date)
        end = datetime.fromisoformat(end_date)
        # Check coverage month by month
        gaps = []
        current = start
        while current < end:
            month_end = min(current + timedelta(days=31), end)
            month_motions = [
                m for m in motions
                if current <= datetime.fromisoformat(str(m.get("date", "1970-01-01"))) < month_end
            ]
            if not month_motions:
                gaps.append({
                    "start": current.isoformat(),
                    "end": month_end.isoformat(),
                })
            current = month_end
        total_days = (end - start).days
        gap_days = sum(
            (datetime.fromisoformat(g["end"]) - datetime.fromisoformat(g["start"])).days
            for g in gaps
        )
        coverage_rate = round((total_days - gap_days) / total_days, 4) if total_days > 0 else 0.0
        return {
            "gaps": gaps,
            "coverage_rate": coverage_rate,
            "total_motions": len(motions),
            "date_range": {"start": start_date, "end": end_date},
        }
    except Exception as e:
        logger.exception("validate_motion_coverage failed")
        return {"gaps": [], "coverage_rate": 0.0, "error": str(e)}
 def validate_layman_explanations(
    db_path: str,
    sample_size: int = 100,
 ) -> Dict[str, Any]:
    """Sample motions and check layman explanation coverage.
    Returns quality metrics for explanations.
    """
    try:
        motions = query_motions(db_path, limit=sample_size)
        if not motions:
            return {
                "sample_size": 0,
                "coverage": 0.0,
                "empty_count": 0,
            }
        with_explanation = sum(
            1 for m in motions
            if m.get("layman_explanation") and str(m.get("layman_explanation")).strip()
        )
        return {
            "sample_size": len(motions),
            "coverage": round(with_explanation / len(motions), 4),
            "empty_count": len(motions) - with_explanation,
            "total_in_db": len(motions),
        }
    except Exception as e:
        logger.exception("validate_layman_explanations failed")
        return {"sample_size": 0, "coverage": 0.0, "error": str(e)}
 def suggest_svd_label(
    db_path: str,
    component: int,
    top_n: int = 10,
 ) -> Dict[str, Any]:
    """Analyze top motions on a component and suggest a label.
    Returns the top positive and negative motions with scores.
    """
    try:
        rows = query_svd_vectors(db_path, "current_parliament", entity_type="motion")
        if not rows:
            return {
                "component": component,
                "error": "No SVD vectors found for current_parliament",
            }
        import json
        scored = []
        for row in rows:
            vec = row.get("vector")
            if isinstance(vec, str):
                vec = json.loads(vec)
            if isinstance(vec, list) and component - 1 < len(vec):
                scored.append({
                    "motion_id": row.get("entity_id"),
                    "score": vec[component - 1],
                })
        scored.sort(key=lambda x: x["score"])
        negative = scored[:top_n]
        positive = scored[-top_n:][::-1]
        return {
            "component": component,
            "suggestion": {
                "negative_pole": negative,
                "positive_pole": positive,
            },
            "top_positive_ids": [m["motion_id"] for m in positive],
            "top_negative_ids": [m["motion_id"] for m in negative],
        }
    except Exception as e:
        logger.exception("suggest_svd_label failed")
        return {"component": component, "error": str(e)}
 def check_embedding_quality(
    db_path: str,
    window_id: str,
 ) -> Dict[str, Any]:
    """Check embedding coverage and quality for a window.
    Returns coverage stats for fused embeddings.
    """
    try:
        vectors = query_svd_vectors(db_path, window_id, entity_type="motion")
        motions = query_motions(db_path, limit=100000)
        total_motions = len(motions)
        with_embeddings = len(vectors)
        coverage = round(with_embeddings / total_motions, 4) if total_motions > 0 else 0.0
        return {
            "window_id": window_id,
            "total_motions": total_motions,
            "with_embeddings": with_embeddings,
            "coverage": coverage,
            "healthy": coverage > 0.8,
        }
    except Exception as e:
        logger.exception("check_embedding_quality failed")
        return {"window_id": window_id, "coverage": 0.0, "error": str(e)}
--- a/agent_tools/context.md
+++ b/agent_tools/context.md
@ -0,0 +1,20 @@
 # Agent Accumulated Context
 This file is maintained by the agent. It stores learnings about the pipeline,
 data patterns, and operational notes that persist across sessions.
 ## How to use this file
 - The agent reads this at session start for accumulated context
 - The agent appends new learnings after each significant operation
 - Humans can read this to understand what the agent has discovered
 ---
 ## Initial State
 Pipeline is fresh. No accumulated learnings yet.
 ---
 *This file grows over time as the agent operates the pipeline.*
--- a/agent_tools/context.py
+++ b/agent_tools/context.py
@ -0,0 +1,110 @@
 """Runtime context injection for agent operation.
 Generates dynamic context about the current pipeline state,
 recent issues, and accumulated knowledge.
 """
 from __future__ import annotations
 import logging
 import os
 from datetime import datetime
 from typing import Any, Dict
 from agent_tools.database import query_pipeline_status
 logger = logging.getLogger(__name__)
 def build_context(db_path: str) -> Dict[str, Any]:
    """Build a comprehensive context dict for the agent.
    This is injected into the agent's prompt at session start.
    """
    status = query_pipeline_status(db_path)
    context = {
        "timestamp": datetime.now().isoformat(),
        "database_path": db_path,
        "pipeline": status,
        "recent_reports": _list_recent_reports(),
        "accumulated_knowledge": _read_context_md(),
    }
    return context
 def render_context_markdown(db_path: str) -> str:
    """Render context as markdown for prompt injection."""
    ctx = build_context(db_path)
    lines = [
        "## Current Pipeline State",
        f"",
        f"- **Motions:** {ctx['pipeline'].get('motion_count', 0):,}",
        f"- **Latest motion:** {ctx['pipeline'].get('latest_motion_date', 'N/A')}",
        f"- **SVD windows:** {ctx['pipeline'].get('svd_window_count', 0)}",
        f"- **Embeddings:** {ctx['pipeline'].get('embedding_count', 0):,}",
        f"- **Healthy:** {'Yes' if ctx['pipeline'].get('healthy') else 'No'}",
        f"",
    ]
    recent = ctx.get("recent_reports", [])
    if recent:
        lines.extend([
            "## Recent Reports",
            f"",
        ])
        for r in recent[:5]:
            lines.append(f"- {r}")
        lines.append("")
    knowledge = ctx.get("accumulated_knowledge", "")
    if knowledge:
        lines.extend([
            "## Accumulated Knowledge",
            f"",
            knowledge,
            f"",
        ])
    return "\n".join(lines)
 def _list_recent_reports() -> list:
    """List recently generated reports."""
    try:
        reports_dir = "reports"
        if not os.path.exists(reports_dir):
            return []
        files = sorted(
            (f for f in os.listdir(reports_dir) if f.endswith(".md")),
            key=lambda f: os.path.getmtime(os.path.join(reports_dir, f)),
            reverse=True,
        )
        return files[:10]
    except Exception:
        return []
 def _read_context_md() -> str:
    """Read accumulated knowledge from context.md."""
    try:
        path = os.path.join("agent_tools", "context.md")
        if os.path.exists(path):
            with open(path, "r", encoding="utf-8") as f:
                return f.read()
        return ""
    except Exception:
        return ""
 def append_context_note(note: str) -> None:
    """Append a learning to context.md."""
    try:
        path = os.path.join("agent_tools", "context.md")
        timestamp = datetime.now().isoformat()
        with open(path, "a", encoding="utf-8") as f:
            f.write(f"\n## {timestamp}\n\n{note}\n")
    except Exception:
        logger.exception("Failed to append context note")
--- a/agent_tools/database.py
+++ b/agent_tools/database.py
@ -0,0 +1,220 @@
 """Database query primitives for agent operation.
 Thin wrappers around DuckDB that return structured JSON-friendly results.
 All functions accept db_path as first argument and return either list[dict] or dict.
 """
 from __future__ import annotations
 import logging
 from typing import Any, Dict, List, Optional
 logger = logging.getLogger(__name__)
 def _connect(db_path: str, read_only: bool = True):
    import duckdb
    return duckdb.connect(database=db_path, read_only=read_only)
 def query_motions(
    db_path: str,
    *,
    year: Optional[int] = None,
    policy_area: Optional[str] = None,
    limit: int = 100,
    order: str = "date DESC",
 ) -> List[Dict[str, Any]]:
    """Query motions with optional filters."""
    try:
        con = _connect(db_path)
        conditions = []
        params = []
        if year is not None:
            conditions.append("EXTRACT(YEAR FROM date) = ?")
            params.append(year)
        if policy_area is not None:
            conditions.append("policy_area = ?")
            params.append(policy_area)
        where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
        sql = f"""
            SELECT id, title, description, date, policy_area,
                   winning_margin, controversy_score, layman_explanation
            FROM motions
            {where_clause}
            ORDER BY {order}
            LIMIT ?
        """
        params.append(limit)
        result = con.execute(sql, params).fetchdf().to_dict("records")
        con.close()
        return result
    except Exception:
        logger.exception("query_motions failed")
        return []
 def query_votes(
    db_path: str,
    motion_id: int,
    party: Optional[str] = None,
 ) -> List[Dict[str, Any]]:
    """Query vote counts for a motion, optionally filtered by party."""
    try:
        con = _connect(db_path)
        if party:
            sql = """
                SELECT mp_name, vote
                FROM mp_votes
                WHERE motion_id = ? AND mp_name IN (
                    SELECT mp_name FROM mp_metadata WHERE party = ?
                )
            """
            result = con.execute(sql, (motion_id, party)).fetchdf().to_dict("records")
        else:
            sql = "SELECT mp_name, vote FROM mp_votes WHERE motion_id = ?"
            result = con.execute(sql, (motion_id,)).fetchdf().to_dict("records")
        con.close()
        return result
    except Exception:
        logger.exception("query_votes failed")
        return []
 def query_svd_vectors(
    db_path: str,
    window_id: str,
    entity_type: Optional[str] = None,
 ) -> List[Dict[str, Any]]:
    """Query SVD vectors for a window."""
    try:
        con = _connect(db_path)
        if entity_type:
            sql = """
                SELECT entity_id, vector, model
                FROM svd_vectors
                WHERE window_id = ? AND entity_type = ?
            """
            result = con.execute(sql, (window_id, entity_type)).fetchdf().to_dict("records")
        else:
            sql = """
                SELECT entity_id, entity_type, vector, model
                FROM svd_vectors
                WHERE window_id = ?
            """
            result = con.execute(sql, (window_id,)).fetchdf().to_dict("records")
        con.close()
        return result
    except Exception:
        logger.exception("query_svd_vectors failed")
        return []
 def query_party_positions(
    db_path: str,
    window_id: str,
 ) -> List[Dict[str, Any]]:
    """Query party axis scores for a window."""
    try:
        con = _connect(db_path)
        # Check if party_axis_scores table exists
        tables = con.execute(
            "SELECT table_name FROM information_schema.tables WHERE table_name = 'party_axis_scores'"
        ).fetchall()
        if tables:
            result = con.execute(
                """
                SELECT party, axis, score
                FROM party_axis_scores
                WHERE window_id = ?
                """,
                (window_id,),
            ).fetchdf().to_dict("records")
        else:
            # Fallback: compute from vectors
            result = _compute_party_positions_from_vectors(con, window_id)
        con.close()
        return result
    except Exception:
        logger.exception("query_party_positions failed")
        return []
 def _compute_party_positions_from_vectors(con, window_id: str) -> List[Dict[str, Any]]:
    """Compute party positions from MP vectors when party_axis_scores doesn't exist."""
    rows = con.execute(
        """
        SELECT sv.entity_id, sv.vector, mm.party
        FROM svd_vectors sv
        JOIN mp_metadata mm ON sv.entity_id = mm.mp_name
        WHERE sv.window_id = ? AND sv.entity_type = 'mp'
        """,
        (window_id,),
    ).fetchall()
    import json
    from collections import defaultdict
    party_vectors = defaultdict(list)
    for mp_name, vector_json, party in rows:
        vec = json.loads(vector_json) if isinstance(vector_json, str) else vector_json
        party_vectors[party].append(vec)
    result = []
    for party, vectors in party_vectors.items():
        if not vectors:
            continue
        # Compute mean position across first 2 components
        dim = len(vectors[0])
        mean = [sum(v[i] for v in vectors) / len(vectors) for i in range(min(dim, 2))]
        result.append({
            "party": party,
            "axis_1": mean[0] if len(mean) > 0 else 0.0,
            "axis_2": mean[1] if len(mean) > 1 else 0.0,
        })
    return result
 def query_pipeline_status(db_path: str) -> Dict[str, Any]:
    """Return pipeline freshness metrics."""
    try:
        con = _connect(db_path)
        motion_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0]
        latest = con.execute("SELECT MAX(date) FROM motions").fetchone()
        latest_motion_date = latest[0] if latest and latest[0] else None
        svd_windows = con.execute(
            "SELECT COUNT(DISTINCT window_id) FROM svd_vectors"
        ).fetchone()[0]
        embedding_count = con.execute(
            "SELECT COUNT(*) FROM svd_vectors WHERE entity_type = 'motion'"
        ).fetchone()[0]
        con.close()
        return {
            "motion_count": motion_count,
            "latest_motion_date": str(latest_motion_date) if latest_motion_date else None,
            "svd_window_count": svd_windows,
            "embedding_count": embedding_count,
            "healthy": motion_count > 0 and svd_windows > 0,
        }
    except Exception:
        logger.exception("query_pipeline_status failed")
        return {
            "motion_count": 0,
            "latest_motion_date": None,
            "svd_window_count": 0,
            "embedding_count": 0,
            "healthy": False,
            "error": "Failed to query pipeline status",
        }
--- a/agent_tools/pipeline.py
+++ b/agent_tools/pipeline.py
@ -0,0 +1,192 @@
 """Pipeline control primitives for agent operation.
 Stage-aware tools for running, monitoring, and diagnosing the data pipeline.
 """
 from __future__ import annotations
 import logging
 from typing import Any, Dict, List, Optional
 from agent_tools.database import query_pipeline_status
 logger = logging.getLogger(__name__)
 VALID_STAGES = {"ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"}
 def pipeline_run_stage(
    db_path: str,
    stage: str,
    window_id: Optional[str] = None,
    dry_run: bool = False,
 ) -> Dict[str, Any]:
    """Run a single pipeline stage.
    Args:
        db_path: Path to DuckDB database
        stage: One of VALID_STAGES
        window_id: Optional window identifier (e.g., "2024", "current_parliament")
        dry_run: If True, return planned actions without executing
    Returns:
        dict with status and metadata
    """
    if stage not in VALID_STAGES:
        return {
            "error": f"Invalid stage '{stage}'. Valid stages: {sorted(VALID_STAGES)}",
        }
    result = {
        "stage": stage,
        "window_id": window_id,
        "dry_run": dry_run,
        "status": "planned" if dry_run else "not_implemented",
    }
    if dry_run:
        return result
    # Actual execution would delegate to pipeline/run_pipeline.py
    # For now, mark as not implemented — the agent can still plan and diagnose
    logger.info("pipeline_run_stage: %s (dry_run=%s)", stage, dry_run)
    return result
 def pipeline_run_full(
    db_path: str,
    dry_run: bool = False,
 ) -> Dict[str, Any]:
    """Run all pipeline stages in dependency order.
    Args:
        db_path: Path to DuckDB database
        dry_run: If True, return planned actions without executing
    Returns:
        dict with stage statuses
    """
    stages = ["ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"]
    results = []
    for stage in stages:
        result = pipeline_run_stage(db_path, stage, dry_run=dry_run)
        results.append(result)
    return {
        "stages": results,
        "dry_run": dry_run,
        "status": "planned" if dry_run else "partial",
    }
 def pipeline_check_health(db_path: str) -> Dict[str, Any]:
    """Check pipeline health and return structured report.
    Reuses the health/ module and database queries.
    """
    try:
        from health.checks import check_motion_freshness, check_embedding_coverage
        checks = []
        healthy = True
        try:
            freshness = check_motion_freshness(db_path)
            checks.append({
                "name": "motion_freshness",
                "healthy": freshness.get("healthy", False),
                "details": freshness,
            })
            if not freshness.get("healthy", False):
                healthy = False
        except Exception as e:
            checks.append({"name": "motion_freshness", "healthy": False, "error": str(e)})
            healthy = False
        try:
            embedding = check_embedding_coverage(db_path)
            checks.append({
                "name": "embedding_coverage",
                "healthy": embedding.get("healthy", False),
                "details": embedding,
            })
            if not embedding.get("healthy", False):
                healthy = False
        except Exception as e:
            checks.append({"name": "embedding_coverage", "healthy": False, "error": str(e)})
            healthy = False
        status = query_pipeline_status(db_path)
        return {
            "healthy": healthy,
            "checks": checks,
            "pipeline_status": status,
        }
    except Exception as e:
        logger.exception("pipeline_check_health failed")
        return {
            "healthy": False,
            "checks": [],
            "error": str(e),
        }
 def pipeline_get_logs(
    db_path: str,
    stage: Optional[str] = None,
    lines: int = 50,
 ) -> List[str]:
    """Return recent log lines for a stage.
    Note: This is a placeholder. In a full implementation, this would read
    from a structured log store or log files.
    """
    # Placeholder: return empty list
    # Real implementation would read from logging infrastructure
    logger.info("pipeline_get_logs requested for stage=%s lines=%d", stage, lines)
    return []
 def pipeline_validate_output(
    db_path: str,
    stage: str,
 ) -> Dict[str, Any]:
    """Validate that a stage's output looks reasonable.
    Args:
        db_path: Path to DuckDB database
        stage: Pipeline stage to validate
    Returns:
        dict with validation results
    """
    if stage not in VALID_STAGES:
        return {
            "valid": False,
            "error": f"Invalid stage '{stage}'",
        }
    try:
        status = query_pipeline_status(db_path)
        validators = {
            "svd": lambda s: s.get("svd_window_count", 0) > 0,
            "similarity": lambda s: s.get("embedding_count", 0) > 0,
            "ingestion": lambda s: s.get("motion_count", 0) > 0,
            "votes": lambda s: s.get("motion_count", 0) > 0,
            "text_embeddings": lambda s: s.get("embedding_count", 0) > 0,
            "fusion": lambda s: s.get("embedding_count", 0) > 0,
        }
        is_valid = validators.get(stage, lambda s: False)(status)
        return {
            "valid": is_valid,
            "stage": stage,
            "pipeline_status": status,
        }
    except Exception as e:
        logger.exception("pipeline_validate_output failed")
        return {"valid": False, "stage": stage, "error": str(e)}
--- a/agent_tools/reports.py
+++ b/agent_tools/reports.py
@ -0,0 +1,149 @@
 """Report generation primitives for agent operation.
 Agents call these to write structured markdown reports to the reports/ directory.
 """
 from __future__ import annotations
 import logging
 import os
 from datetime import datetime
 from typing import Any, Dict
 from agent_tools.database import query_pipeline_status
 logger = logging.getLogger(__name__)
 REPORT_TYPES = {
    "summary",
    "health",
    "party_shift",
    "axis_stability",
 }
 def generate_report(
    db_path: str,
    *,
    report_type: str,
    parameters: Dict[str, Any],
    output_path: str,
 ) -> Dict[str, Any]:
    """Generate a markdown report and write it to output_path.
    Args:
        db_path: Path to DuckDB database
        report_type: One of REPORT_TYPES
        parameters: Type-specific parameters
        output_path: Where to write the markdown file
    Returns:
        dict with "output_path" and "status" keys, or "error" on failure
    """
    if report_type not in REPORT_TYPES:
        return {
            "error": f"Unknown report type '{report_type}'. Known types: {sorted(REPORT_TYPES)}",
        }
    try:
        content = _render_report(db_path, report_type, parameters)
        os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(content)
        return {"output_path": output_path, "status": "written"}
    except Exception as e:
        logger.exception("generate_report failed")
        return {"error": str(e)}
 def _render_report(db_path: str, report_type: str, parameters: Dict[str, Any]) -> str:
    """Render report content as markdown."""
    lines = [
        f"# Stemwijzer Report: {report_type.replace('_', ' ').title()}",
        f"",
        f"Generated: {datetime.now().isoformat()}",
        f"",
    ]
    if report_type == "summary":
        status = query_pipeline_status(db_path)
        lines.extend([
            "## Pipeline Summary",
            f"",
            f"- **Motions in database:** {status.get('motion_count', 0):,}",
            f"- **Latest motion date:** {status.get('latest_motion_date', 'N/A')}",
            f"- **SVD windows computed:** {status.get('svd_window_count', 0)}",
            f"- **Motion embeddings:** {status.get('embedding_count', 0):,}",
            f"- **Overall health:** {'✅ Healthy' if status.get('healthy') else '⚠️ Needs attention'}",
            f"",
        ])
    elif report_type == "health":
        status = query_pipeline_status(db_path)
        lines.extend([
            "## Pipeline Health Check",
            f"",
            f"| Metric | Value | Status |",
            f"|--------|-------|--------|",
            f"| Motion count | {status.get('motion_count', 0):,} | {'✅' if status.get('motion_count', 0) > 0 else '⚠️'} |",
            f"| Latest motion | {status.get('latest_motion_date', 'N/A')} | {'✅' if status.get('latest_motion_date') else '⚠️'} |",
            f"| SVD windows | {status.get('svd_window_count', 0)} | {'✅' if status.get('svd_window_count', 0) > 0 else '⚠️'} |",
            f"| Embeddings | {status.get('embedding_count', 0):,} | {'✅' if status.get('embedding_count', 0) > 0 else '⚠️'} |",
            f"",
        ])
    elif report_type == "party_shift":
        from agent_tools.analysis import analyze_party_shift
        party = parameters.get("party", "VVD")
        start = parameters.get("window_start", "2020")
        end = parameters.get("window_end", "2024")
        result = analyze_party_shift(db_path, party, start, end)
        if "error" in result:
            lines.extend(["## Party Shift Analysis", f"", f"Error: {result['error']}", f""])
        else:
            lines.extend([
                "## Party Shift Analysis",
                f"",
                f"**Party:** {result['party']}",
                f"**Period:** {result['window_start']} → {result['window_end']}",
                f"**Shift magnitude:** {result['shift']}",
                f"**Direction:** dx={result['direction']['dx']}, dy={result['direction']['dy']}",
                f"",
                f"### Start position",
                f"- Axis 1: {result['start_position']['axis_1']}",
                f"- Axis 2: {result['start_position']['axis_2']}",
                f"",
                f"### End position",
                f"- Axis 1: {result['end_position']['axis_1']}",
                f"- Axis 2: {result['end_position']['axis_2']}",
                f"",
            ])
    elif report_type == "axis_stability":
        from agent_tools.analysis import analyze_axis_stability
        component = parameters.get("component", 1)
        windows = parameters.get("windows", ["2020", "2021", "2022", "2023", "2024"])
        result = analyze_axis_stability(db_path, component, windows)
        if "error" in result:
            lines.extend(["## Axis Stability Analysis", f"", f"Error: {result['error']}", f""])
        else:
            lines.extend([
                "## Axis Stability Analysis",
                f"",
                f"**Component:** {result['component']}",
                f"**Average stability:** {result['stability']}",
                f"",
                f"### Pairwise correlations",
                f"",
                f"| From | To | Correlation |",
                f"|------|-----|-------------|",
            ])
            for pair in result.get("pairwise", []):
                lines.append(f"| {pair['from_window']} | {pair['to_window']} | {pair['correlation']} |")
            lines.append("")
    return "\n".join(lines)
--- a/docs/plans/2026-05-01-002-agent-native-architecture-plan.md
+++ b/docs/plans/2026-05-01-002-agent-native-architecture-plan.md
@ -0,0 +1,233 @@
 ---
 title: Agent-Native Architecture Plan for Stemwijzer
 type: refactor
 status: active
 date: 2026-05-01
 origin: STRATEGY.md (agent-native architecture track)
 ---
 # Agent-Native Architecture Plan for Stemwijzer
 ## Overview
 Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns.
 **Current state:** The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries.
 **Target state:** An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved.
 ---
 ## Problem Frame
 - **Pipeline operators** need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output.
 - **Analysts/researchers** want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code.
 - **Developers** need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually.
 - **Content maintainers** need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc.
 ---
 ## Requirements Trace
 - R1. The agent can achieve anything a pipeline operator can achieve (parity)
 - R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability)
 - R3. The agent can diagnose pipeline health and suggest remediation (self-service operations)
 - R4. The agent can generate and validate content (SVD labels, motion summaries)
 - R5. New capabilities can be added by writing prompts, not code (composability)
 ---
 ## Scope Boundaries
 - **In scope:** Agent primitives for data operations, pipeline control, analysis, and diagnostics
 - **Deferred:** Real-time agent UI inside Streamlit (future phase—add chat interface to explorer)
 - **Deferred:** Autonomous pipeline scheduling (scheduler.py exists but agent control is v2)
 - **Not working on:** Natural language to SQL for end users (this plan targets agent operators, not voter-facing features)
 ---
 ## Key Technical Decisions
 - **Files as universal interface:** DuckDB is already file-based (`data/motions.db`). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads.
 - **Database tools over file tools for structured data:** For querying motions, votes, and embeddings, the agent needs `query_database` primitives that wrap DuckDB/SQL, not raw file operations.
 - **Pipeline as state machine:** The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything."
 - **Shared workspace:** Agent and human operate on the same `data/motions.db`, the same `thoughts/explorer/` outputs, the same `docs/solutions/` knowledge base.
 ---
 ## Implementation Units
 - [ ] U1. **Database query primitives**
  - **Goal:** Give the agent structured access to the DuckDB database
  - **Requirements:** R1, R2, R4
  - **Dependencies:** None
  - **Files:**
    - Create: `agent_tools/database.py`
    - Test: `tests/agent_tools/test_database_tools.py`
  - **Approach:** Wrap DuckDB queries as atomic tools:
    - `query_motions(filter, limit, order)` → returns motion rows as JSON
    - `query_votes(motion_id, party)` → returns vote counts
    - `query_svd_vectors(window_id, entity_type)` → returns vectors
    - `query_party_positions(window_id)` → returns party axis scores
    - `query_pipeline_status()` → returns freshness metrics from health checks
  - **Patterns to follow:** `health/checks.py` already has DB query patterns; `analysis/explorer_data.py` has read-only query patterns
  - **Test scenarios:**
    - Happy path: query returns valid JSON for known filters
    - Edge case: empty result set returns `[]` not error
    - Error path: invalid SQL/filter returns structured error with suggestion
  - **Verification:** Agent can answer "How many motions in 2024?" using only the tool
 - [ ] U2. **Pipeline control primitives**
  - **Goal:** Let the agent run, monitor, and diagnose pipeline stages
  - **Requirements:** R1, R3
  - **Dependencies:** U1
  - **Files:**
    - Create: `agent_tools/pipeline.py`
    - Test: `tests/agent_tools/test_pipeline_tools.py`
  - **Approach:** Stage-aware pipeline tools:
    - `pipeline_run_stage(stage, window_id, dry_run)` → runs one stage, returns status
    - `pipeline_run_full(dry_run)` → orchestrates all stages with dependency ordering
    - `pipeline_check_health()` → returns health report (reuses `health/` module)
    - `pipeline_get_logs(stage, lines)` → returns recent logs for a stage
    - `pipeline_validate_output(stage)` → checks output exists and looks reasonable
  - **Patterns to follow:** `pipeline/run_pipeline.py` has the stage orchestration; `scripts/health_check.py` has the CLI pattern
  - **Test scenarios:**
    - Happy path: dry-run returns planned actions without executing
    - Integration: running `pipeline_run_stage("svd", "2024")` produces expected `svd_vectors` rows
    - Error path: running a stage with missing dependencies returns clear error
  - **Verification:** Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run
 - [ ] U3. **Analysis and report generation primitives**
  - **Goal:** Let the agent perform analytical tasks and write reports
  - **Requirements:** R2, R4
  - **Dependencies:** U1
  - **Files:**
    - Create: `agent_tools/analysis.py`
    - Create: `agent_tools/reports.py`
    - Test: `tests/agent_tools/test_analysis_tools.py`
  - **Approach:**
    - `analyze_party_shift(party, window_start, window_end, metric)` → computes and returns shift data
    - `analyze_axis_stability(component, windows)` → returns stability scores
    - `generate_report(type, parameters, output_path)` → writes markdown report to `reports/`
    - `validate_svd_labels(component)` → compares theme labels to actual party positions
  - **Patterns to follow:** `analysis/political_axis.py`, `scripts/motion_drift.py`, `scripts/validate_svd_themes.py`
  - **Test scenarios:**
    - Happy path: `analyze_party_shift` returns structured data for known party
    - Integration: `generate_report("drift", {windows: ["2020", "2024"]})` produces valid markdown
    - Edge case: requesting analysis for nonexistent window returns empty result
  - **Verification:** Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results
 - [ ] U4. **Content validation primitives**
  - **Goal:** Let the agent validate and suggest content improvements
  - **Requirements:** R4
  - **Dependencies:** U1, U3
  - **Files:**
    - Create: `agent_tools/content.py`
    - Test: `tests/agent_tools/test_content_tools.py`
  - **Approach:**
    - `validate_motion_coverage(start_date, end_date)` → returns coverage gaps
    - `validate_layman_explanations(sample_size)` → samples motions, checks explanation quality
    - `suggest_svd_label(component, top_n_motions)` → analyzes top motions, suggests label
    - `check_embedding_quality(window_id)` → returns coverage stats for fused embeddings
  - **Patterns to follow:** `summarizer.py` for explanation logic; `scripts/validate_svd_themes.py` for theme validation
  - **Test scenarios:**
    - Happy path: `validate_motion_coverage` returns accurate gap list
    - Edge case: all motions covered returns empty gaps
  - **Verification:** Agent can run weekly content quality checks and produce a report
 - [ ] U5. **System prompt and context injection**
  - **Goal:** Define agent behavior and inject runtime context
  - **Requirements:** R1, R2, R3, R4, R5
  - **Dependencies:** U1-U4
  - **Files:**
    - Create: `agent_tools/SYSTEM_PROMPT.md`
    - Create: `agent_tools/context.py`
  - **Approach:**
    - `SYSTEM_PROMPT.md`: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions
    - `context.py`: Injects runtime context—current pipeline status, latest SVD window, known issues from `docs/solutions/`, active party list
    - `context.md` pattern: Agent maintains `agent_tools/context.md` with accumulated learnings about the pipeline
  - **Patterns to follow:** `ce-agent-native-architecture` context.md pattern; `AGENTS.md` for project conventions
  - **Test scenarios:**
    - Context injection produces valid markdown with current DB stats
    - System prompt loads and parses without errors
  - **Verification:** Agent session starts with full context of pipeline state
 - [ ] U6. **Agent-native testing and parity verification**
  - **Goal:** Ensure agent can do everything humans can do
  - **Requirements:** R1
  - **Dependencies:** U1-U5
  - **Files:**
    - Create: `tests/agent_tools/test_parity.py`
    - Modify: `tests/conftest.py` (add agent tool fixtures)
  - **Approach:**
    - Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome
    - Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify)
    - `test_parity.py`: Matrix of human action → agent tool → expected outcome
  - **Test scenarios:**
    - Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result
    - Integration: Agent detects stale data, runs pipeline, verifies freshness
  - **Verification:** All parity tests pass
 ---
 ## Output Structure
 ```
 agent_tools/                    # New directory
 ├── __init__.py
 ├── SYSTEM_PROMPT.md            # Agent behavior definition
 ├── context.py                  # Runtime context injection
 ├── context.md                  # Accumulated agent knowledge
 ├── database.py                 # DB query primitives
 ├── pipeline.py                 # Pipeline control primitives
 ├── analysis.py                 # Analysis primitives
 ├── reports.py                  # Report generation
 └── content.py                  # Content validation primitives
 tests/agent_tools/              # New test directory
 ├── __init__.py
 ├── test_database_tools.py
 ├── test_pipeline_tools.py
 ├── test_analysis_tools.py
 ├── test_content_tools.py
 └── test_parity.py
 reports/                        # Agent-generated reports (gitignored)
 ```
 ---
 ## System-Wide Impact
 - **Interaction graph:** Agent tools call into `database.py`, `pipeline/`, `analysis/`, `health/` modules. These modules are already well-factored and read-only where appropriate.
 - **Error propagation:** Agent tools return structured errors (JSON with `error`, `suggestion`, `retryable` fields) rather than raising exceptions. This lets the agent reason about failures.
 - **State lifecycle:** Agent-generated reports in `reports/` are ephemeral (gitignored). Agent updates to `context.md` are durable and committed.
 - **Unchanged invariants:** The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor.
 ---
 ## Risks & Dependencies
 | Risk | Mitigation |
 |------|-----------|
 | DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. |
 | Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. |
 | Context injection grows too large | Context is scoped to the task. `context.py` generates minimal relevant context, not full DB dumps. |
 | Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. |
 ---
 ## Documentation / Operational Notes
 - Add `agent_tools/` to `AGENTS.md` so future agents know the capability surface exists
 - Document the parity test matrix in `tests/agent_tools/README.md`
 - `reports/` should be gitignored; agent reports are ephemeral outputs
 ---
 ## Sources & References
 - **Origin:** STRATEGY.md (agent-native architecture track)
 - **Skill:** `ce-agent-native-architecture` (parity, granularity, composability, emergent capability)
 - **Related code:** `health/`, `pipeline/`, `analysis/`, `database.py`
 - **Related docs:** `docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md` (P4 tracks)
--- a/tests/agent_tools/test_analysis_tools.py
+++ b/tests/agent_tools/test_analysis_tools.py
@ -0,0 +1,74 @@
 """Tests for agent analysis and report generation primitives."""
 import pytest
 import os
 pytest.importorskip("duckdb")
 class TestAnalyzePartyShift:
    def test_returns_shift_data(self, tmp_duckdb_path):
        from agent_tools.analysis import analyze_party_shift
        result = analyze_party_shift(
            tmp_duckdb_path, party="VVD", window_start="2020", window_end="2024"
        )
        assert isinstance(result, dict)
        assert "party" in result
        assert "shift" in result or "error" in result
    def test_nonexistent_party_returns_error(self, tmp_duckdb_path):
        from agent_tools.analysis import analyze_party_shift
        result = analyze_party_shift(
            tmp_duckdb_path, party="FAKE", window_start="2020", window_end="2024"
        )
        assert isinstance(result, dict)
 class TestAnalyzeAxisStability:
    def test_returns_stability_scores(self, tmp_duckdb_path):
        from agent_tools.analysis import analyze_axis_stability
        result = analyze_axis_stability(tmp_duckdb_path, component=1, windows=["2020", "2024"])
        assert isinstance(result, dict)
        assert "component" in result
        assert "stability" in result or "error" in result
 class TestGenerateReport:
    def test_writes_markdown_file(self, tmp_duckdb_path, tmp_path):
        from agent_tools.reports import generate_report
        output_path = str(tmp_path / "report.md")
        result = generate_report(
            tmp_duckdb_path,
            report_type="summary",
            parameters={},
            output_path=output_path,
        )
        assert isinstance(result, dict)
        assert os.path.exists(output_path)
    def test_returns_error_for_unknown_type(self, tmp_duckdb_path, tmp_path):
        from agent_tools.reports import generate_report
        output_path = str(tmp_path / "report.md")
        result = generate_report(
            tmp_duckdb_path,
            report_type="unknown",
            parameters={},
            output_path=output_path,
        )
        assert isinstance(result, dict)
        assert "error" in result
 class TestValidateSvdLabels:
    def test_returns_validation_result(self, tmp_duckdb_path):
        from agent_tools.analysis import validate_svd_labels
        result = validate_svd_labels(tmp_duckdb_path, component=1)
        assert isinstance(result, dict)
        assert "component" in result
        assert "valid" in result or "error" in result
--- a/tests/agent_tools/test_content_tools.py
+++ b/tests/agent_tools/test_content_tools.py
@ -0,0 +1,44 @@
 """Tests for agent content validation primitives."""
 import pytest
 pytest.importorskip("duckdb")
 class TestValidateMotionCoverage:
    def test_returns_coverage_gaps(self, tmp_duckdb_path):
        from agent_tools.content import validate_motion_coverage
        result = validate_motion_coverage(tmp_duckdb_path, start_date="2024-01-01", end_date="2024-12-31")
        assert isinstance(result, dict)
        assert "gaps" in result
        assert "coverage_rate" in result or "error" in result
 class TestValidateLaymanExplanations:
    def test_returns_quality_report(self, tmp_duckdb_path):
        from agent_tools.content import validate_layman_explanations
        result = validate_layman_explanations(tmp_duckdb_path, sample_size=5)
        assert isinstance(result, dict)
        assert "sample_size" in result
        assert "coverage" in result or "error" in result
 class TestSuggestSvdLabel:
    def test_returns_suggestion(self, tmp_duckdb_path):
        from agent_tools.content import suggest_svd_label
        result = suggest_svd_label(tmp_duckdb_path, component=1, top_n=5)
        assert isinstance(result, dict)
        assert "component" in result
        assert "suggestion" in result or "error" in result
 class TestCheckEmbeddingQuality:
    def test_returns_coverage_stats(self, tmp_duckdb_path):
        from agent_tools.content import check_embedding_quality
        result = check_embedding_quality(tmp_duckdb_path, window_id="current_parliament")
        assert isinstance(result, dict)
        assert "coverage" in result or "error" in result
--- a/tests/agent_tools/test_database_tools.py
+++ b/tests/agent_tools/test_database_tools.py
@ -0,0 +1,75 @@
 """Tests for agent database query primitives."""
 import pytest
 import json
 pytest.importorskip("duckdb")
 class TestQueryMotions:
    def test_returns_motion_rows(self, tmp_duckdb_path):
        from agent_tools.database import query_motions
        result = query_motions(tmp_duckdb_path)
        assert isinstance(result, list)
    def test_respects_limit(self, tmp_duckdb_path):
        from agent_tools.database import query_motions
        result = query_motions(tmp_duckdb_path, limit=5)
        assert len(result) <= 5
    def test_empty_db_returns_empty_list(self, tmp_duckdb_path):
        from agent_tools.database import query_motions
        result = query_motions(tmp_duckdb_path)
        assert result == []
 class TestQueryVotes:
    def test_returns_vote_counts(self, tmp_duckdb_path):
        from agent_tools.database import query_votes
        result = query_votes(tmp_duckdb_path, motion_id=1)
        assert isinstance(result, list)
    def test_filters_by_party(self, tmp_duckdb_path):
        from agent_tools.database import query_votes
        result = query_votes(tmp_duckdb_path, motion_id=1, party="VVD")
        assert isinstance(result, list)
 class TestQuerySvdVectors:
    def test_returns_vectors(self, tmp_duckdb_path):
        from agent_tools.database import query_svd_vectors
        result = query_svd_vectors(tmp_duckdb_path, window_id="current_parliament")
        assert isinstance(result, list)
    def test_filters_by_entity_type(self, tmp_duckdb_path):
        from agent_tools.database import query_svd_vectors
        result = query_svd_vectors(
            tmp_duckdb_path, window_id="current_parliament", entity_type="mp"
        )
        assert isinstance(result, list)
 class TestQueryPartyPositions:
    def test_returns_party_scores(self, tmp_duckdb_path):
        from agent_tools.database import query_party_positions
        result = query_party_positions(tmp_duckdb_path, window_id="current_parliament")
        assert isinstance(result, list)
 class TestQueryPipelineStatus:
    def test_returns_status_dict(self, tmp_duckdb_path):
        from agent_tools.database import query_pipeline_status
        result = query_pipeline_status(tmp_duckdb_path)
        assert isinstance(result, dict)
        assert "motion_count" in result
        assert "latest_motion_date" in result
        assert "svd_window_count" in result
--- a/tests/agent_tools/test_parity.py
+++ b/tests/agent_tools/test_parity.py
@ -0,0 +1,160 @@
 """Parity tests: verify agent tools can achieve what humans can.
 These tests ensure the agent-native architecture satisfies the parity principle:
 "Whatever the user can do through the UI/scripts, the agent can achieve through tools."
 """
 import os
 import pytest
 pytest.importorskip("duckdb")
 class TestDatabaseParity:
    """Agent database queries vs human SQL queries."""
    def test_agent_query_motions_matches_raw_sql(self, tmp_duckdb_path):
        """Human: SELECT * FROM motions LIMIT 10
        Agent: query_motions(db_path, limit=10)
        """
        import duckdb
        from agent_tools.database import query_motions
        # Human approach — handle empty DB gracefully
        con = duckdb.connect(tmp_duckdb_path)
        try:
            human_result = con.execute("SELECT * FROM motions LIMIT 10").fetchdf().to_dict("records")
        except Exception:
            human_result = []
        con.close()
        # Agent approach
        agent_result = query_motions(tmp_duckdb_path, limit=10)
        # Both should return lists
        assert isinstance(human_result, list)
        assert isinstance(agent_result, list)
        assert len(agent_result) == len(human_result)
    def test_agent_pipeline_status_matches_raw_query(self, tmp_duckdb_path):
        """Human: SELECT COUNT(*) FROM motions
        Agent: query_pipeline_status(db_path)
        """
        import duckdb
        from agent_tools.database import query_pipeline_status
        con = duckdb.connect(tmp_duckdb_path)
        try:
            human_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0]
        except Exception:
            human_count = 0
        con.close()
        agent_status = query_pipeline_status(tmp_duckdb_path)
        assert agent_status["motion_count"] == human_count
 class TestHealthCheckParity:
    """Agent health check vs human script execution."""
    def test_agent_health_check_matches_script(self, tmp_duckdb_path):
        """Human: python scripts/health_check.py
        Agent: pipeline_check_health(db_path)
        """
        from agent_tools.pipeline import pipeline_check_health
        # Agent approach
        agent_result = pipeline_check_health(tmp_duckdb_path)
        assert isinstance(agent_result, dict)
        assert "healthy" in agent_result
        assert "checks" in agent_result
 class TestReportGenerationParity:
    """Agent report generation vs human manual analysis."""
    def test_agent_generates_summary_report(self, tmp_duckdb_path, tmp_path):
        """Human: Write a summary of pipeline state
        Agent: generate_report(db_path, "summary", ...)
        """
        from agent_tools.reports import generate_report
        output_path = str(tmp_path / "summary.md")
        result = generate_report(
            tmp_duckdb_path,
            report_type="summary",
            parameters={},
            output_path=output_path,
        )
        assert result["status"] == "written"
        assert os.path.exists(output_path)
        # Should contain key sections
        content = open(output_path).read()
        assert "Pipeline Summary" in content
        assert "Motions in database" in content
 class TestAnalysisParity:
    """Agent analysis vs human analytical queries."""
    def test_agent_party_shift_analysis(self, tmp_duckdb_path):
        """Human: Write SQL to compare party positions across windows
        Agent: analyze_party_shift(db_path, ...)
        """
        from agent_tools.analysis import analyze_party_shift
        result = analyze_party_shift(
            tmp_duckdb_path,
            party="VVD",
            window_start="2020",
            window_end="2024",
        )
        # Should return structured result (or error if no data)
        assert isinstance(result, dict)
        assert "party" in result
        # Either shift data or error (empty DB is fine)
        assert "shift" in result or "error" in result
 class TestIntegrationAgentDiagnosticLoop:
    """Integration: Agent performs full diagnostic loop."""
    def test_agent_diagnoses_stale_data(self, tmp_duckdb_path):
        """Agent loop:
        1. Check health
        2. Query pipeline status
        3. Identify issue (empty DB = no data)
        4. Suggest remediation
        """
        from agent_tools.pipeline import pipeline_check_health
        from agent_tools.database import query_pipeline_status
        # Step 1: Check health
        health = pipeline_check_health(tmp_duckdb_path)
        # Step 2: Query status
        status = query_pipeline_status(tmp_duckdb_path)
        # Step 3: Agent reasoning (simulated)
        issues = []
        if status["motion_count"] == 0:
            issues.append("No motions in database")
        if status["svd_window_count"] == 0:
            issues.append("No SVD windows computed")
        # Step 4: Suggest remediation
        suggestions = []
        if "No motions in database" in issues:
            suggestions.append("Run pipeline ingestion stage")
        if "No SVD windows computed" in issues:
            suggestions.append("Run SVD computation after ingestion")
        assert isinstance(issues, list)
        assert isinstance(suggestions, list)
        # Empty DB should produce actionable suggestions
        assert len(suggestions) > 0
--- a/tests/agent_tools/test_pipeline_tools.py
+++ b/tests/agent_tools/test_pipeline_tools.py
@ -0,0 +1,59 @@
 """Tests for agent pipeline control primitives."""
 import pytest
 pytest.importorskip("duckdb")
 class TestPipelineRunStage:
    def test_dry_run_returns_planned_actions(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_run_stage
        result = pipeline_run_stage(tmp_duckdb_path, stage="svd", window_id="2024", dry_run=True)
        assert isinstance(result, dict)
        assert "stage" in result
        assert result.get("dry_run") is True
    def test_invalid_stage_returns_error(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_run_stage
        result = pipeline_run_stage(tmp_duckdb_path, stage="invalid")
        assert isinstance(result, dict)
        assert "error" in result
 class TestPipelineRunFull:
    def test_dry_run_returns_plan(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_run_full
        result = pipeline_run_full(tmp_duckdb_path, dry_run=True)
        assert isinstance(result, dict)
        assert "stages" in result or "dry_run" in result
 class TestPipelineCheckHealth:
    def test_returns_health_report(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_check_health
        result = pipeline_check_health(tmp_duckdb_path)
        assert isinstance(result, dict)
        assert "checks" in result
        assert "healthy" in result
 class TestPipelineGetLogs:
    def test_returns_log_lines(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_get_logs
        result = pipeline_get_logs(tmp_duckdb_path, stage="svd", lines=10)
        assert isinstance(result, list)
        assert len(result) <= 10
 class TestPipelineValidateOutput:
    def test_validates_stage_output(self, tmp_duckdb_path):
        from agent_tools.pipeline import pipeline_validate_output
        result = pipeline_validate_output(tmp_duckdb_path, stage="svd")
        assert isinstance(result, dict)
        assert "valid" in result
		`@ -0,0 +1 @@`
							`"""Agent tools for Stemwijzer — atomic primitives for agent operation."""`