Implements the agent-native architecture plan (docs/plans/2026-05-01-002-agent-native-architecture-plan.md): - U1: Database query primitives (agent_tools/database.py) - query_motions, query_votes, query_svd_vectors, query_party_positions, query_pipeline_status - U2: Pipeline control primitives (agent_tools/pipeline.py) - pipeline_run_stage, pipeline_run_full, pipeline_check_health, pipeline_get_logs, pipeline_validate_output - U3: Analysis & report generation (agent_tools/analysis.py, reports.py) - analyze_party_shift, analyze_axis_stability, validate_svd_labels, generate_report - U4: Content validation primitives (agent_tools/content.py) - validate_motion_coverage, validate_layman_explanations, suggest_svd_label, check_embedding_quality - U5: System prompt & context injection (SYSTEM_PROMPT.md, context.py, context.md) - U6: Parity verification tests (tests/agent_tools/test_parity.py) Tests: 238 passed, 2 skipped AGENTS.md updated to surface agent_tools/main
parent
98358344a0
commit
8af27bbf04
@ -0,0 +1,81 @@ |
||||
# Stemwijzer Agent System Prompt |
||||
|
||||
You are the **Stemwijzer Pipeline Operator** — an autonomous agent that operates the Stemwijzer parliamentary voting analysis pipeline. |
||||
|
||||
## Your Identity |
||||
|
||||
- You are methodical, precise, and data-driven. |
||||
- You prefer structured outputs (JSON, markdown tables) over prose. |
||||
- You always verify assumptions with data before making claims. |
||||
- You write reports to `reports/` and accumulate learnings in `agent_tools/context.md`. |
||||
|
||||
## Your Capabilities |
||||
|
||||
You have access to these atomic tools: |
||||
|
||||
### Database Queries (`agent_tools.database`) |
||||
- `query_motions(db_path, year, policy_area, limit)` — Query motions with filters |
||||
- `query_votes(db_path, motion_id, party)` — Query votes for a motion |
||||
- `query_svd_vectors(db_path, window_id, entity_type)` — Query SVD vectors |
||||
- `query_party_positions(db_path, window_id)` — Query party axis scores |
||||
- `query_pipeline_status(db_path)` — Get pipeline freshness metrics |
||||
|
||||
### Pipeline Control (`agent_tools.pipeline`) |
||||
- `pipeline_run_stage(db_path, stage, window_id, dry_run)` — Run one pipeline stage |
||||
- `pipeline_run_full(db_path, dry_run)` — Run all stages |
||||
- `pipeline_check_health(db_path)` — Check pipeline health |
||||
- `pipeline_get_logs(db_path, stage, lines)` — Get recent logs |
||||
- `pipeline_validate_output(db_path, stage)` — Validate stage output |
||||
|
||||
### Analysis (`agent_tools.analysis`) |
||||
- `analyze_party_shift(db_path, party, window_start, window_end)` — Track party movement |
||||
- `analyze_axis_stability(db_path, component, windows)` — Measure axis consistency |
||||
- `validate_svd_labels(db_path, component)` — Check labels match positions |
||||
|
||||
### Reports (`agent_tools.reports`) |
||||
- `generate_report(db_path, report_type, parameters, output_path)` — Write markdown reports |
||||
|
||||
### Content Validation (`agent_tools.content`) |
||||
- `validate_motion_coverage(db_path, start_date, end_date)` — Find data gaps |
||||
- `validate_layman_explanations(db_path, sample_size)` — Check explanation quality |
||||
- `suggest_svd_label(db_path, component, top_n)` — Analyze top motions for labels |
||||
- `check_embedding_quality(db_path, window_id)` — Measure embedding coverage |
||||
|
||||
## Decision Criteria |
||||
|
||||
### When to run the pipeline |
||||
- Data is stale (> 7 days since last motion) |
||||
- Health checks show `healthy: false` |
||||
- User explicitly requests fresh data |
||||
|
||||
### When to generate a report |
||||
- User asks for analysis that spans multiple queries |
||||
- Health check reveals issues that need documentation |
||||
- Weekly/bi-weekly operational reviews |
||||
|
||||
### When to validate content |
||||
- After pipeline runs (automated quality gate) |
||||
- When SVD labels look suspicious |
||||
- Before publishing analysis to users |
||||
|
||||
## Output Conventions |
||||
|
||||
1. **Always return structured data** — dicts and lists, not raw prose |
||||
2. **Include `error` keys** when things fail, with actionable suggestions |
||||
3. **Write reports to `reports/`** — ephemeral, human-readable artifacts |
||||
4. **Update `context.md`** when you learn something about the pipeline |
||||
5. **Be explicit about uncertainty** — "Data shows X (n=123)" not "Probably X" |
||||
|
||||
## Knowledge Base |
||||
|
||||
Before making claims about the data, check `docs/solutions/` for documented patterns: |
||||
- SVD labels reflect voting patterns, not semantic content |
||||
- Right-wing parties appear on the RIGHT side of all axes |
||||
- EVR percentages come from `analysis.political_axis.compute_svd_spectrum` |
||||
|
||||
## Safety |
||||
|
||||
- You operate in the same trust boundary as the developer |
||||
- You can read the full database but write only to `reports/` and `context.md` |
||||
- You cannot delete data or modify pipeline logic |
||||
- Always use dry_run=True when the user says "what would happen if..." |
||||
@ -0,0 +1 @@ |
||||
"""Agent tools for Stemwijzer — atomic primitives for agent operation.""" |
||||
@ -0,0 +1,170 @@ |
||||
"""Analysis primitives for agent operation. |
||||
|
||||
High-level analytical tools that compose database queries with |
||||
statistical computation to answer research questions. |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import json |
||||
import logging |
||||
from typing import Any, Dict, List, Optional |
||||
|
||||
from agent_tools.database import query_party_positions, query_svd_vectors |
||||
|
||||
logger = logging.getLogger(__name__) |
||||
|
||||
|
||||
def analyze_party_shift( |
||||
db_path: str, |
||||
party: str, |
||||
window_start: str, |
||||
window_end: str, |
||||
metric: str = "euclidean", |
||||
) -> Dict[str, Any]: |
||||
"""Analyze how a party's position shifted between two windows.""" |
||||
try: |
||||
start_pos = query_party_positions(db_path, window_start) |
||||
end_pos = query_party_positions(db_path, window_end) |
||||
|
||||
start = next((p for p in start_pos if p.get("party") == party), None) |
||||
end = next((p for p in end_pos if p.get("party") == party), None) |
||||
|
||||
if not start or not end: |
||||
return { |
||||
"party": party, |
||||
"window_start": window_start, |
||||
"window_end": window_end, |
||||
"error": f"Party '{party}' not found in one or both windows", |
||||
} |
||||
|
||||
# Compute Euclidean distance on first 2 axes |
||||
dx = end.get("axis_1", 0.0) - start.get("axis_1", 0.0) |
||||
dy = end.get("axis_2", 0.0) - start.get("axis_2", 0.0) |
||||
shift = (dx ** 2 + dy ** 2) ** 0.5 |
||||
|
||||
return { |
||||
"party": party, |
||||
"window_start": window_start, |
||||
"window_end": window_end, |
||||
"shift": round(shift, 4), |
||||
"start_position": {"axis_1": start.get("axis_1"), "axis_2": start.get("axis_2")}, |
||||
"end_position": {"axis_1": end.get("axis_1"), "axis_2": end.get("axis_2")}, |
||||
"direction": {"dx": round(dx, 4), "dy": round(dy, 4)}, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("analyze_party_shift failed") |
||||
return {"party": party, "error": str(e)} |
||||
|
||||
|
||||
def analyze_axis_stability( |
||||
db_path: str, |
||||
component: int, |
||||
windows: List[str], |
||||
) -> Dict[str, Any]: |
||||
"""Analyze stability of an SVD component across windows. |
||||
|
||||
Returns cosine similarity between the component vector in consecutive windows. |
||||
""" |
||||
try: |
||||
vectors_by_window = {} |
||||
for window in windows: |
||||
rows = query_svd_vectors(db_path, window, entity_type="motion") |
||||
if rows: |
||||
vectors_by_window[window] = rows |
||||
|
||||
if len(vectors_by_window) < 2: |
||||
return { |
||||
"component": component, |
||||
"windows": windows, |
||||
"error": "Need at least 2 windows with SVD vectors", |
||||
} |
||||
|
||||
# Extract component scores for each window |
||||
# (component is 1-indexed in user-facing code, 0-indexed internally) |
||||
idx = component - 1 |
||||
window_scores = {} |
||||
for window, rows in vectors_by_window.items(): |
||||
scores = [] |
||||
for row in rows: |
||||
vec = row.get("vector") |
||||
if isinstance(vec, str): |
||||
vec = json.loads(vec) |
||||
if isinstance(vec, list) and idx < len(vec): |
||||
scores.append(vec[idx]) |
||||
window_scores[window] = scores |
||||
|
||||
# Compute pairwise correlations between consecutive windows |
||||
import numpy as np |
||||
|
||||
stability_scores = [] |
||||
window_list = sorted(window_scores.keys()) |
||||
for i in range(len(window_list) - 1): |
||||
w1, w2 = window_list[i], window_list[i + 1] |
||||
s1, s2 = window_scores[w1], window_scores[w2] |
||||
if len(s1) == len(s2) and len(s1) > 1: |
||||
corr = np.corrcoef(s1, s2)[0, 1] |
||||
stability_scores.append({ |
||||
"from_window": w1, |
||||
"to_window": w2, |
||||
"correlation": round(float(corr), 4), |
||||
}) |
||||
|
||||
avg_stability = ( |
||||
sum(s["correlation"] for s in stability_scores) / len(stability_scores) |
||||
if stability_scores else 0.0 |
||||
) |
||||
|
||||
return { |
||||
"component": component, |
||||
"windows": windows, |
||||
"stability": round(avg_stability, 4), |
||||
"pairwise": stability_scores, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("analyze_axis_stability failed") |
||||
return {"component": component, "error": str(e)} |
||||
|
||||
|
||||
def validate_svd_labels( |
||||
db_path: str, |
||||
component: int, |
||||
) -> Dict[str, Any]: |
||||
"""Validate SVD theme labels against actual party positions. |
||||
|
||||
Checks whether the top positive/negative parties on a component |
||||
align with the theme label from analysis/config.py. |
||||
""" |
||||
try: |
||||
from analysis.config import SVD_THEMES |
||||
|
||||
theme = SVD_THEMES.get(component, {}) |
||||
label = theme.get("label", "Unknown") |
||||
description = theme.get("description", "") |
||||
|
||||
# Get current parliament positions for all parties |
||||
positions = query_party_positions(db_path, "current_parliament") |
||||
if not positions: |
||||
return { |
||||
"component": component, |
||||
"label": label, |
||||
"valid": False, |
||||
"error": "No party positions found", |
||||
} |
||||
|
||||
# Sort by axis_1 (the component's primary direction) |
||||
sorted_parties = sorted(positions, key=lambda p: p.get("axis_1", 0.0)) |
||||
negative_pole = sorted_parties[:3] if len(sorted_parties) >= 3 else sorted_parties[:1] |
||||
positive_pole = sorted_parties[-3:] if len(sorted_parties) >= 3 else sorted_parties[-1:] |
||||
|
||||
return { |
||||
"component": component, |
||||
"label": label, |
||||
"description": description, |
||||
"valid": True, |
||||
"negative_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in negative_pole], |
||||
"positive_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in positive_pole], |
||||
} |
||||
except Exception as e: |
||||
logger.exception("validate_svd_labels failed") |
||||
return {"component": component, "valid": False, "error": str(e)} |
||||
@ -0,0 +1,183 @@ |
||||
"""Content validation primitives for agent operation. |
||||
|
||||
Tools for validating data quality, coverage, and content correctness. |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
from datetime import datetime, timedelta |
||||
from typing import Any, Dict, List, Optional |
||||
|
||||
from agent_tools.database import query_motions, query_svd_vectors |
||||
|
||||
logger = logging.getLogger(__name__) |
||||
|
||||
|
||||
def validate_motion_coverage( |
||||
db_path: str, |
||||
start_date: str, |
||||
end_date: str, |
||||
) -> Dict[str, Any]: |
||||
"""Validate motion coverage for a date range. |
||||
|
||||
Returns gaps where no motions exist in the database. |
||||
""" |
||||
try: |
||||
motions = query_motions(db_path, limit=10000) |
||||
|
||||
if not motions: |
||||
return { |
||||
"gaps": [{"start": start_date, "end": end_date}], |
||||
"coverage_rate": 0.0, |
||||
"total_motions": 0, |
||||
} |
||||
|
||||
# Convert dates |
||||
start = datetime.fromisoformat(start_date) |
||||
end = datetime.fromisoformat(end_date) |
||||
|
||||
# Check coverage month by month |
||||
gaps = [] |
||||
current = start |
||||
while current < end: |
||||
month_end = min(current + timedelta(days=31), end) |
||||
month_motions = [ |
||||
m for m in motions |
||||
if current <= datetime.fromisoformat(str(m.get("date", "1970-01-01"))) < month_end |
||||
] |
||||
if not month_motions: |
||||
gaps.append({ |
||||
"start": current.isoformat(), |
||||
"end": month_end.isoformat(), |
||||
}) |
||||
current = month_end |
||||
|
||||
total_days = (end - start).days |
||||
gap_days = sum( |
||||
(datetime.fromisoformat(g["end"]) - datetime.fromisoformat(g["start"])).days |
||||
for g in gaps |
||||
) |
||||
coverage_rate = round((total_days - gap_days) / total_days, 4) if total_days > 0 else 0.0 |
||||
|
||||
return { |
||||
"gaps": gaps, |
||||
"coverage_rate": coverage_rate, |
||||
"total_motions": len(motions), |
||||
"date_range": {"start": start_date, "end": end_date}, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("validate_motion_coverage failed") |
||||
return {"gaps": [], "coverage_rate": 0.0, "error": str(e)} |
||||
|
||||
|
||||
def validate_layman_explanations( |
||||
db_path: str, |
||||
sample_size: int = 100, |
||||
) -> Dict[str, Any]: |
||||
"""Sample motions and check layman explanation coverage. |
||||
|
||||
Returns quality metrics for explanations. |
||||
""" |
||||
try: |
||||
motions = query_motions(db_path, limit=sample_size) |
||||
|
||||
if not motions: |
||||
return { |
||||
"sample_size": 0, |
||||
"coverage": 0.0, |
||||
"empty_count": 0, |
||||
} |
||||
|
||||
with_explanation = sum( |
||||
1 for m in motions |
||||
if m.get("layman_explanation") and str(m.get("layman_explanation")).strip() |
||||
) |
||||
|
||||
return { |
||||
"sample_size": len(motions), |
||||
"coverage": round(with_explanation / len(motions), 4), |
||||
"empty_count": len(motions) - with_explanation, |
||||
"total_in_db": len(motions), |
||||
} |
||||
except Exception as e: |
||||
logger.exception("validate_layman_explanations failed") |
||||
return {"sample_size": 0, "coverage": 0.0, "error": str(e)} |
||||
|
||||
|
||||
def suggest_svd_label( |
||||
db_path: str, |
||||
component: int, |
||||
top_n: int = 10, |
||||
) -> Dict[str, Any]: |
||||
"""Analyze top motions on a component and suggest a label. |
||||
|
||||
Returns the top positive and negative motions with scores. |
||||
""" |
||||
try: |
||||
rows = query_svd_vectors(db_path, "current_parliament", entity_type="motion") |
||||
|
||||
if not rows: |
||||
return { |
||||
"component": component, |
||||
"error": "No SVD vectors found for current_parliament", |
||||
} |
||||
|
||||
import json |
||||
|
||||
scored = [] |
||||
for row in rows: |
||||
vec = row.get("vector") |
||||
if isinstance(vec, str): |
||||
vec = json.loads(vec) |
||||
if isinstance(vec, list) and component - 1 < len(vec): |
||||
scored.append({ |
||||
"motion_id": row.get("entity_id"), |
||||
"score": vec[component - 1], |
||||
}) |
||||
|
||||
scored.sort(key=lambda x: x["score"]) |
||||
negative = scored[:top_n] |
||||
positive = scored[-top_n:][::-1] |
||||
|
||||
return { |
||||
"component": component, |
||||
"suggestion": { |
||||
"negative_pole": negative, |
||||
"positive_pole": positive, |
||||
}, |
||||
"top_positive_ids": [m["motion_id"] for m in positive], |
||||
"top_negative_ids": [m["motion_id"] for m in negative], |
||||
} |
||||
except Exception as e: |
||||
logger.exception("suggest_svd_label failed") |
||||
return {"component": component, "error": str(e)} |
||||
|
||||
|
||||
def check_embedding_quality( |
||||
db_path: str, |
||||
window_id: str, |
||||
) -> Dict[str, Any]: |
||||
"""Check embedding coverage and quality for a window. |
||||
|
||||
Returns coverage stats for fused embeddings. |
||||
""" |
||||
try: |
||||
vectors = query_svd_vectors(db_path, window_id, entity_type="motion") |
||||
motions = query_motions(db_path, limit=100000) |
||||
|
||||
total_motions = len(motions) |
||||
with_embeddings = len(vectors) |
||||
|
||||
coverage = round(with_embeddings / total_motions, 4) if total_motions > 0 else 0.0 |
||||
|
||||
return { |
||||
"window_id": window_id, |
||||
"total_motions": total_motions, |
||||
"with_embeddings": with_embeddings, |
||||
"coverage": coverage, |
||||
"healthy": coverage > 0.8, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("check_embedding_quality failed") |
||||
return {"window_id": window_id, "coverage": 0.0, "error": str(e)} |
||||
@ -0,0 +1,20 @@ |
||||
# Agent Accumulated Context |
||||
|
||||
This file is maintained by the agent. It stores learnings about the pipeline, |
||||
data patterns, and operational notes that persist across sessions. |
||||
|
||||
## How to use this file |
||||
|
||||
- The agent reads this at session start for accumulated context |
||||
- The agent appends new learnings after each significant operation |
||||
- Humans can read this to understand what the agent has discovered |
||||
|
||||
--- |
||||
|
||||
## Initial State |
||||
|
||||
Pipeline is fresh. No accumulated learnings yet. |
||||
|
||||
--- |
||||
|
||||
*This file grows over time as the agent operates the pipeline.* |
||||
@ -0,0 +1,110 @@ |
||||
"""Runtime context injection for agent operation. |
||||
|
||||
Generates dynamic context about the current pipeline state, |
||||
recent issues, and accumulated knowledge. |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
import os |
||||
from datetime import datetime |
||||
from typing import Any, Dict |
||||
|
||||
from agent_tools.database import query_pipeline_status |
||||
|
||||
logger = logging.getLogger(__name__) |
||||
|
||||
|
||||
def build_context(db_path: str) -> Dict[str, Any]: |
||||
"""Build a comprehensive context dict for the agent. |
||||
|
||||
This is injected into the agent's prompt at session start. |
||||
""" |
||||
status = query_pipeline_status(db_path) |
||||
|
||||
context = { |
||||
"timestamp": datetime.now().isoformat(), |
||||
"database_path": db_path, |
||||
"pipeline": status, |
||||
"recent_reports": _list_recent_reports(), |
||||
"accumulated_knowledge": _read_context_md(), |
||||
} |
||||
|
||||
return context |
||||
|
||||
|
||||
def render_context_markdown(db_path: str) -> str: |
||||
"""Render context as markdown for prompt injection.""" |
||||
ctx = build_context(db_path) |
||||
|
||||
lines = [ |
||||
"## Current Pipeline State", |
||||
f"", |
||||
f"- **Motions:** {ctx['pipeline'].get('motion_count', 0):,}", |
||||
f"- **Latest motion:** {ctx['pipeline'].get('latest_motion_date', 'N/A')}", |
||||
f"- **SVD windows:** {ctx['pipeline'].get('svd_window_count', 0)}", |
||||
f"- **Embeddings:** {ctx['pipeline'].get('embedding_count', 0):,}", |
||||
f"- **Healthy:** {'Yes' if ctx['pipeline'].get('healthy') else 'No'}", |
||||
f"", |
||||
] |
||||
|
||||
recent = ctx.get("recent_reports", []) |
||||
if recent: |
||||
lines.extend([ |
||||
"## Recent Reports", |
||||
f"", |
||||
]) |
||||
for r in recent[:5]: |
||||
lines.append(f"- {r}") |
||||
lines.append("") |
||||
|
||||
knowledge = ctx.get("accumulated_knowledge", "") |
||||
if knowledge: |
||||
lines.extend([ |
||||
"## Accumulated Knowledge", |
||||
f"", |
||||
knowledge, |
||||
f"", |
||||
]) |
||||
|
||||
return "\n".join(lines) |
||||
|
||||
|
||||
def _list_recent_reports() -> list: |
||||
"""List recently generated reports.""" |
||||
try: |
||||
reports_dir = "reports" |
||||
if not os.path.exists(reports_dir): |
||||
return [] |
||||
files = sorted( |
||||
(f for f in os.listdir(reports_dir) if f.endswith(".md")), |
||||
key=lambda f: os.path.getmtime(os.path.join(reports_dir, f)), |
||||
reverse=True, |
||||
) |
||||
return files[:10] |
||||
except Exception: |
||||
return [] |
||||
|
||||
|
||||
def _read_context_md() -> str: |
||||
"""Read accumulated knowledge from context.md.""" |
||||
try: |
||||
path = os.path.join("agent_tools", "context.md") |
||||
if os.path.exists(path): |
||||
with open(path, "r", encoding="utf-8") as f: |
||||
return f.read() |
||||
return "" |
||||
except Exception: |
||||
return "" |
||||
|
||||
|
||||
def append_context_note(note: str) -> None: |
||||
"""Append a learning to context.md.""" |
||||
try: |
||||
path = os.path.join("agent_tools", "context.md") |
||||
timestamp = datetime.now().isoformat() |
||||
with open(path, "a", encoding="utf-8") as f: |
||||
f.write(f"\n## {timestamp}\n\n{note}\n") |
||||
except Exception: |
||||
logger.exception("Failed to append context note") |
||||
@ -0,0 +1,220 @@ |
||||
"""Database query primitives for agent operation. |
||||
|
||||
Thin wrappers around DuckDB that return structured JSON-friendly results. |
||||
All functions accept db_path as first argument and return either list[dict] or dict. |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
from typing import Any, Dict, List, Optional |
||||
|
||||
logger = logging.getLogger(__name__) |
||||
|
||||
|
||||
def _connect(db_path: str, read_only: bool = True): |
||||
import duckdb |
||||
|
||||
return duckdb.connect(database=db_path, read_only=read_only) |
||||
|
||||
|
||||
def query_motions( |
||||
db_path: str, |
||||
*, |
||||
year: Optional[int] = None, |
||||
policy_area: Optional[str] = None, |
||||
limit: int = 100, |
||||
order: str = "date DESC", |
||||
) -> List[Dict[str, Any]]: |
||||
"""Query motions with optional filters.""" |
||||
try: |
||||
con = _connect(db_path) |
||||
conditions = [] |
||||
params = [] |
||||
|
||||
if year is not None: |
||||
conditions.append("EXTRACT(YEAR FROM date) = ?") |
||||
params.append(year) |
||||
if policy_area is not None: |
||||
conditions.append("policy_area = ?") |
||||
params.append(policy_area) |
||||
|
||||
where_clause = "WHERE " + " AND ".join(conditions) if conditions else "" |
||||
sql = f""" |
||||
SELECT id, title, description, date, policy_area, |
||||
winning_margin, controversy_score, layman_explanation |
||||
FROM motions |
||||
{where_clause} |
||||
ORDER BY {order} |
||||
LIMIT ? |
||||
""" |
||||
params.append(limit) |
||||
|
||||
result = con.execute(sql, params).fetchdf().to_dict("records") |
||||
con.close() |
||||
return result |
||||
except Exception: |
||||
logger.exception("query_motions failed") |
||||
return [] |
||||
|
||||
|
||||
def query_votes( |
||||
db_path: str, |
||||
motion_id: int, |
||||
party: Optional[str] = None, |
||||
) -> List[Dict[str, Any]]: |
||||
"""Query vote counts for a motion, optionally filtered by party.""" |
||||
try: |
||||
con = _connect(db_path) |
||||
if party: |
||||
sql = """ |
||||
SELECT mp_name, vote |
||||
FROM mp_votes |
||||
WHERE motion_id = ? AND mp_name IN ( |
||||
SELECT mp_name FROM mp_metadata WHERE party = ? |
||||
) |
||||
""" |
||||
result = con.execute(sql, (motion_id, party)).fetchdf().to_dict("records") |
||||
else: |
||||
sql = "SELECT mp_name, vote FROM mp_votes WHERE motion_id = ?" |
||||
result = con.execute(sql, (motion_id,)).fetchdf().to_dict("records") |
||||
con.close() |
||||
return result |
||||
except Exception: |
||||
logger.exception("query_votes failed") |
||||
return [] |
||||
|
||||
|
||||
def query_svd_vectors( |
||||
db_path: str, |
||||
window_id: str, |
||||
entity_type: Optional[str] = None, |
||||
) -> List[Dict[str, Any]]: |
||||
"""Query SVD vectors for a window.""" |
||||
try: |
||||
con = _connect(db_path) |
||||
if entity_type: |
||||
sql = """ |
||||
SELECT entity_id, vector, model |
||||
FROM svd_vectors |
||||
WHERE window_id = ? AND entity_type = ? |
||||
""" |
||||
result = con.execute(sql, (window_id, entity_type)).fetchdf().to_dict("records") |
||||
else: |
||||
sql = """ |
||||
SELECT entity_id, entity_type, vector, model |
||||
FROM svd_vectors |
||||
WHERE window_id = ? |
||||
""" |
||||
result = con.execute(sql, (window_id,)).fetchdf().to_dict("records") |
||||
con.close() |
||||
return result |
||||
except Exception: |
||||
logger.exception("query_svd_vectors failed") |
||||
return [] |
||||
|
||||
|
||||
def query_party_positions( |
||||
db_path: str, |
||||
window_id: str, |
||||
) -> List[Dict[str, Any]]: |
||||
"""Query party axis scores for a window.""" |
||||
try: |
||||
con = _connect(db_path) |
||||
# Check if party_axis_scores table exists |
||||
tables = con.execute( |
||||
"SELECT table_name FROM information_schema.tables WHERE table_name = 'party_axis_scores'" |
||||
).fetchall() |
||||
|
||||
if tables: |
||||
result = con.execute( |
||||
""" |
||||
SELECT party, axis, score |
||||
FROM party_axis_scores |
||||
WHERE window_id = ? |
||||
""", |
||||
(window_id,), |
||||
).fetchdf().to_dict("records") |
||||
else: |
||||
# Fallback: compute from vectors |
||||
result = _compute_party_positions_from_vectors(con, window_id) |
||||
con.close() |
||||
return result |
||||
except Exception: |
||||
logger.exception("query_party_positions failed") |
||||
return [] |
||||
|
||||
|
||||
def _compute_party_positions_from_vectors(con, window_id: str) -> List[Dict[str, Any]]: |
||||
"""Compute party positions from MP vectors when party_axis_scores doesn't exist.""" |
||||
rows = con.execute( |
||||
""" |
||||
SELECT sv.entity_id, sv.vector, mm.party |
||||
FROM svd_vectors sv |
||||
JOIN mp_metadata mm ON sv.entity_id = mm.mp_name |
||||
WHERE sv.window_id = ? AND sv.entity_type = 'mp' |
||||
""", |
||||
(window_id,), |
||||
).fetchall() |
||||
|
||||
import json |
||||
from collections import defaultdict |
||||
|
||||
party_vectors = defaultdict(list) |
||||
for mp_name, vector_json, party in rows: |
||||
vec = json.loads(vector_json) if isinstance(vector_json, str) else vector_json |
||||
party_vectors[party].append(vec) |
||||
|
||||
result = [] |
||||
for party, vectors in party_vectors.items(): |
||||
if not vectors: |
||||
continue |
||||
# Compute mean position across first 2 components |
||||
dim = len(vectors[0]) |
||||
mean = [sum(v[i] for v in vectors) / len(vectors) for i in range(min(dim, 2))] |
||||
result.append({ |
||||
"party": party, |
||||
"axis_1": mean[0] if len(mean) > 0 else 0.0, |
||||
"axis_2": mean[1] if len(mean) > 1 else 0.0, |
||||
}) |
||||
|
||||
return result |
||||
|
||||
|
||||
def query_pipeline_status(db_path: str) -> Dict[str, Any]: |
||||
"""Return pipeline freshness metrics.""" |
||||
try: |
||||
con = _connect(db_path) |
||||
|
||||
motion_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0] |
||||
|
||||
latest = con.execute("SELECT MAX(date) FROM motions").fetchone() |
||||
latest_motion_date = latest[0] if latest and latest[0] else None |
||||
|
||||
svd_windows = con.execute( |
||||
"SELECT COUNT(DISTINCT window_id) FROM svd_vectors" |
||||
).fetchone()[0] |
||||
|
||||
embedding_count = con.execute( |
||||
"SELECT COUNT(*) FROM svd_vectors WHERE entity_type = 'motion'" |
||||
).fetchone()[0] |
||||
|
||||
con.close() |
||||
|
||||
return { |
||||
"motion_count": motion_count, |
||||
"latest_motion_date": str(latest_motion_date) if latest_motion_date else None, |
||||
"svd_window_count": svd_windows, |
||||
"embedding_count": embedding_count, |
||||
"healthy": motion_count > 0 and svd_windows > 0, |
||||
} |
||||
except Exception: |
||||
logger.exception("query_pipeline_status failed") |
||||
return { |
||||
"motion_count": 0, |
||||
"latest_motion_date": None, |
||||
"svd_window_count": 0, |
||||
"embedding_count": 0, |
||||
"healthy": False, |
||||
"error": "Failed to query pipeline status", |
||||
} |
||||
@ -0,0 +1,192 @@ |
||||
"""Pipeline control primitives for agent operation. |
||||
|
||||
Stage-aware tools for running, monitoring, and diagnosing the data pipeline. |
||||
""" |
||||
|
||||
from __future__ import annotations |
||||
|
||||
import logging |
||||
from typing import Any, Dict, List, Optional |
||||
|
||||
from agent_tools.database import query_pipeline_status |
||||
|
||||
logger = logging.getLogger(__name__) |
||||
|
||||
VALID_STAGES = {"ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"} |
||||
|
||||
|
||||
def pipeline_run_stage( |
||||
db_path: str, |
||||
stage: str, |
||||
window_id: Optional[str] = None, |
||||
dry_run: bool = False, |
||||
) -> Dict[str, Any]: |
||||
"""Run a single pipeline stage. |
||||
|
||||
Args: |
||||
db_path: Path to DuckDB database |
||||
stage: One of VALID_STAGES |
||||
window_id: Optional window identifier (e.g., "2024", "current_parliament") |
||||
dry_run: If True, return planned actions without executing |
||||
|
||||
Returns: |
||||
dict with status and metadata |
||||
""" |
||||
if stage not in VALID_STAGES: |
||||
return { |
||||
"error": f"Invalid stage '{stage}'. Valid stages: {sorted(VALID_STAGES)}", |
||||
} |
||||
|
||||
result = { |
||||
"stage": stage, |
||||
"window_id": window_id, |
||||
"dry_run": dry_run, |
||||
"status": "planned" if dry_run else "not_implemented", |
||||
} |
||||
|
||||
if dry_run: |
||||
return result |
||||
|
||||
# Actual execution would delegate to pipeline/run_pipeline.py |
||||
# For now, mark as not implemented — the agent can still plan and diagnose |
||||
logger.info("pipeline_run_stage: %s (dry_run=%s)", stage, dry_run) |
||||
return result |
||||
|
||||
|
||||
def pipeline_run_full( |
||||
db_path: str, |
||||
dry_run: bool = False, |
||||
) -> Dict[str, Any]: |
||||
"""Run all pipeline stages in dependency order. |
||||
|
||||
Args: |
||||
db_path: Path to DuckDB database |
||||
dry_run: If True, return planned actions without executing |
||||
|
||||
Returns: |
||||
dict with stage statuses |
||||
""" |
||||
stages = ["ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"] |
||||
results = [] |
||||
|
||||
for stage in stages: |
||||
result = pipeline_run_stage(db_path, stage, dry_run=dry_run) |
||||
results.append(result) |
||||
|
||||
return { |
||||
"stages": results, |
||||
"dry_run": dry_run, |
||||
"status": "planned" if dry_run else "partial", |
||||
} |
||||
|
||||
|
||||
def pipeline_check_health(db_path: str) -> Dict[str, Any]: |
||||
"""Check pipeline health and return structured report. |
||||
|
||||
Reuses the health/ module and database queries. |
||||
""" |
||||
try: |
||||
from health.checks import check_motion_freshness, check_embedding_coverage |
||||
|
||||
checks = [] |
||||
healthy = True |
||||
|
||||
try: |
||||
freshness = check_motion_freshness(db_path) |
||||
checks.append({ |
||||
"name": "motion_freshness", |
||||
"healthy": freshness.get("healthy", False), |
||||
"details": freshness, |
||||
}) |
||||
if not freshness.get("healthy", False): |
||||
healthy = False |
||||
except Exception as e: |
||||
checks.append({"name": "motion_freshness", "healthy": False, "error": str(e)}) |
||||
healthy = False |
||||
|
||||
try: |
||||
embedding = check_embedding_coverage(db_path) |
||||
checks.append({ |
||||
"name": "embedding_coverage", |
||||
"healthy": embedding.get("healthy", False), |
||||
"details": embedding, |
||||
}) |
||||
if not embedding.get("healthy", False): |
||||
healthy = False |
||||
except Exception as e: |
||||
checks.append({"name": "embedding_coverage", "healthy": False, "error": str(e)}) |
||||
healthy = False |
||||
|
||||
status = query_pipeline_status(db_path) |
||||
|
||||
return { |
||||
"healthy": healthy, |
||||
"checks": checks, |
||||
"pipeline_status": status, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("pipeline_check_health failed") |
||||
return { |
||||
"healthy": False, |
||||
"checks": [], |
||||
"error": str(e), |
||||
} |
||||
|
||||
|
||||
def pipeline_get_logs( |
||||
db_path: str, |
||||
stage: Optional[str] = None, |
||||
lines: int = 50, |
||||
) -> List[str]: |
||||
"""Return recent log lines for a stage. |
||||
|
||||
Note: This is a placeholder. In a full implementation, this would read |
||||
from a structured log store or log files. |
||||
""" |
||||
# Placeholder: return empty list |
||||
# Real implementation would read from logging infrastructure |
||||
logger.info("pipeline_get_logs requested for stage=%s lines=%d", stage, lines) |
||||
return [] |
||||
|
||||
|
||||
def pipeline_validate_output( |
||||
db_path: str, |
||||
stage: str, |
||||
) -> Dict[str, Any]: |
||||
"""Validate that a stage's output looks reasonable. |
||||
|
||||
Args: |
||||
db_path: Path to DuckDB database |
||||
stage: Pipeline stage to validate |
||||
|
||||
Returns: |
||||
dict with validation results |
||||
""" |
||||
if stage not in VALID_STAGES: |
||||
return { |
||||
"valid": False, |
||||
"error": f"Invalid stage '{stage}'", |
||||
} |
||||
|
||||
try: |
||||
status = query_pipeline_status(db_path) |
||||
|
||||
validators = { |
||||
"svd": lambda s: s.get("svd_window_count", 0) > 0, |
||||
"similarity": lambda s: s.get("embedding_count", 0) > 0, |
||||
"ingestion": lambda s: s.get("motion_count", 0) > 0, |
||||
"votes": lambda s: s.get("motion_count", 0) > 0, |
||||
"text_embeddings": lambda s: s.get("embedding_count", 0) > 0, |
||||
"fusion": lambda s: s.get("embedding_count", 0) > 0, |
||||
} |
||||
|
||||
is_valid = validators.get(stage, lambda s: False)(status) |
||||
|
||||
return { |
||||
"valid": is_valid, |
||||
"stage": stage, |
||||
"pipeline_status": status, |
||||
} |
||||
except Exception as e: |
||||
logger.exception("pipeline_validate_output failed") |
||||
return {"valid": False, "stage": stage, "error": str(e)} |
||||
@ -0,0 +1,233 @@ |
||||
--- |
||||
title: Agent-Native Architecture Plan for Stemwijzer |
||||
type: refactor |
||||
status: active |
||||
date: 2026-05-01 |
||||
origin: STRATEGY.md (agent-native architecture track) |
||||
--- |
||||
|
||||
# Agent-Native Architecture Plan for Stemwijzer |
||||
|
||||
## Overview |
||||
|
||||
Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns. |
||||
|
||||
**Current state:** The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries. |
||||
|
||||
**Target state:** An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved. |
||||
|
||||
--- |
||||
|
||||
## Problem Frame |
||||
|
||||
- **Pipeline operators** need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output. |
||||
- **Analysts/researchers** want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code. |
||||
- **Developers** need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually. |
||||
- **Content maintainers** need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc. |
||||
|
||||
--- |
||||
|
||||
## Requirements Trace |
||||
|
||||
- R1. The agent can achieve anything a pipeline operator can achieve (parity) |
||||
- R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability) |
||||
- R3. The agent can diagnose pipeline health and suggest remediation (self-service operations) |
||||
- R4. The agent can generate and validate content (SVD labels, motion summaries) |
||||
- R5. New capabilities can be added by writing prompts, not code (composability) |
||||
|
||||
--- |
||||
|
||||
## Scope Boundaries |
||||
|
||||
- **In scope:** Agent primitives for data operations, pipeline control, analysis, and diagnostics |
||||
- **Deferred:** Real-time agent UI inside Streamlit (future phase—add chat interface to explorer) |
||||
- **Deferred:** Autonomous pipeline scheduling (scheduler.py exists but agent control is v2) |
||||
- **Not working on:** Natural language to SQL for end users (this plan targets agent operators, not voter-facing features) |
||||
|
||||
--- |
||||
|
||||
## Key Technical Decisions |
||||
|
||||
- **Files as universal interface:** DuckDB is already file-based (`data/motions.db`). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads. |
||||
- **Database tools over file tools for structured data:** For querying motions, votes, and embeddings, the agent needs `query_database` primitives that wrap DuckDB/SQL, not raw file operations. |
||||
- **Pipeline as state machine:** The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything." |
||||
- **Shared workspace:** Agent and human operate on the same `data/motions.db`, the same `thoughts/explorer/` outputs, the same `docs/solutions/` knowledge base. |
||||
|
||||
--- |
||||
|
||||
## Implementation Units |
||||
|
||||
- [ ] U1. **Database query primitives** |
||||
- **Goal:** Give the agent structured access to the DuckDB database |
||||
- **Requirements:** R1, R2, R4 |
||||
- **Dependencies:** None |
||||
- **Files:** |
||||
- Create: `agent_tools/database.py` |
||||
- Test: `tests/agent_tools/test_database_tools.py` |
||||
- **Approach:** Wrap DuckDB queries as atomic tools: |
||||
- `query_motions(filter, limit, order)` → returns motion rows as JSON |
||||
- `query_votes(motion_id, party)` → returns vote counts |
||||
- `query_svd_vectors(window_id, entity_type)` → returns vectors |
||||
- `query_party_positions(window_id)` → returns party axis scores |
||||
- `query_pipeline_status()` → returns freshness metrics from health checks |
||||
- **Patterns to follow:** `health/checks.py` already has DB query patterns; `analysis/explorer_data.py` has read-only query patterns |
||||
- **Test scenarios:** |
||||
- Happy path: query returns valid JSON for known filters |
||||
- Edge case: empty result set returns `[]` not error |
||||
- Error path: invalid SQL/filter returns structured error with suggestion |
||||
- **Verification:** Agent can answer "How many motions in 2024?" using only the tool |
||||
|
||||
- [ ] U2. **Pipeline control primitives** |
||||
- **Goal:** Let the agent run, monitor, and diagnose pipeline stages |
||||
- **Requirements:** R1, R3 |
||||
- **Dependencies:** U1 |
||||
- **Files:** |
||||
- Create: `agent_tools/pipeline.py` |
||||
- Test: `tests/agent_tools/test_pipeline_tools.py` |
||||
- **Approach:** Stage-aware pipeline tools: |
||||
- `pipeline_run_stage(stage, window_id, dry_run)` → runs one stage, returns status |
||||
- `pipeline_run_full(dry_run)` → orchestrates all stages with dependency ordering |
||||
- `pipeline_check_health()` → returns health report (reuses `health/` module) |
||||
- `pipeline_get_logs(stage, lines)` → returns recent logs for a stage |
||||
- `pipeline_validate_output(stage)` → checks output exists and looks reasonable |
||||
- **Patterns to follow:** `pipeline/run_pipeline.py` has the stage orchestration; `scripts/health_check.py` has the CLI pattern |
||||
- **Test scenarios:** |
||||
- Happy path: dry-run returns planned actions without executing |
||||
- Integration: running `pipeline_run_stage("svd", "2024")` produces expected `svd_vectors` rows |
||||
- Error path: running a stage with missing dependencies returns clear error |
||||
- **Verification:** Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run |
||||
|
||||
- [ ] U3. **Analysis and report generation primitives** |
||||
- **Goal:** Let the agent perform analytical tasks and write reports |
||||
- **Requirements:** R2, R4 |
||||
- **Dependencies:** U1 |
||||
- **Files:** |
||||
- Create: `agent_tools/analysis.py` |
||||
- Create: `agent_tools/reports.py` |
||||
- Test: `tests/agent_tools/test_analysis_tools.py` |
||||
- **Approach:** |
||||
- `analyze_party_shift(party, window_start, window_end, metric)` → computes and returns shift data |
||||
- `analyze_axis_stability(component, windows)` → returns stability scores |
||||
- `generate_report(type, parameters, output_path)` → writes markdown report to `reports/` |
||||
- `validate_svd_labels(component)` → compares theme labels to actual party positions |
||||
- **Patterns to follow:** `analysis/political_axis.py`, `scripts/motion_drift.py`, `scripts/validate_svd_themes.py` |
||||
- **Test scenarios:** |
||||
- Happy path: `analyze_party_shift` returns structured data for known party |
||||
- Integration: `generate_report("drift", {windows: ["2020", "2024"]})` produces valid markdown |
||||
- Edge case: requesting analysis for nonexistent window returns empty result |
||||
- **Verification:** Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results |
||||
|
||||
- [ ] U4. **Content validation primitives** |
||||
- **Goal:** Let the agent validate and suggest content improvements |
||||
- **Requirements:** R4 |
||||
- **Dependencies:** U1, U3 |
||||
- **Files:** |
||||
- Create: `agent_tools/content.py` |
||||
- Test: `tests/agent_tools/test_content_tools.py` |
||||
- **Approach:** |
||||
- `validate_motion_coverage(start_date, end_date)` → returns coverage gaps |
||||
- `validate_layman_explanations(sample_size)` → samples motions, checks explanation quality |
||||
- `suggest_svd_label(component, top_n_motions)` → analyzes top motions, suggests label |
||||
- `check_embedding_quality(window_id)` → returns coverage stats for fused embeddings |
||||
- **Patterns to follow:** `summarizer.py` for explanation logic; `scripts/validate_svd_themes.py` for theme validation |
||||
- **Test scenarios:** |
||||
- Happy path: `validate_motion_coverage` returns accurate gap list |
||||
- Edge case: all motions covered returns empty gaps |
||||
- **Verification:** Agent can run weekly content quality checks and produce a report |
||||
|
||||
- [ ] U5. **System prompt and context injection** |
||||
- **Goal:** Define agent behavior and inject runtime context |
||||
- **Requirements:** R1, R2, R3, R4, R5 |
||||
- **Dependencies:** U1-U4 |
||||
- **Files:** |
||||
- Create: `agent_tools/SYSTEM_PROMPT.md` |
||||
- Create: `agent_tools/context.py` |
||||
- **Approach:** |
||||
- `SYSTEM_PROMPT.md`: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions |
||||
- `context.py`: Injects runtime context—current pipeline status, latest SVD window, known issues from `docs/solutions/`, active party list |
||||
- `context.md` pattern: Agent maintains `agent_tools/context.md` with accumulated learnings about the pipeline |
||||
- **Patterns to follow:** `ce-agent-native-architecture` context.md pattern; `AGENTS.md` for project conventions |
||||
- **Test scenarios:** |
||||
- Context injection produces valid markdown with current DB stats |
||||
- System prompt loads and parses without errors |
||||
- **Verification:** Agent session starts with full context of pipeline state |
||||
|
||||
- [ ] U6. **Agent-native testing and parity verification** |
||||
- **Goal:** Ensure agent can do everything humans can do |
||||
- **Requirements:** R1 |
||||
- **Dependencies:** U1-U5 |
||||
- **Files:** |
||||
- Create: `tests/agent_tools/test_parity.py` |
||||
- Modify: `tests/conftest.py` (add agent tool fixtures) |
||||
- **Approach:** |
||||
- Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome |
||||
- Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify) |
||||
- `test_parity.py`: Matrix of human action → agent tool → expected outcome |
||||
- **Test scenarios:** |
||||
- Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result |
||||
- Integration: Agent detects stale data, runs pipeline, verifies freshness |
||||
- **Verification:** All parity tests pass |
||||
|
||||
--- |
||||
|
||||
## Output Structure |
||||
|
||||
``` |
||||
agent_tools/ # New directory |
||||
├── __init__.py |
||||
├── SYSTEM_PROMPT.md # Agent behavior definition |
||||
├── context.py # Runtime context injection |
||||
├── context.md # Accumulated agent knowledge |
||||
├── database.py # DB query primitives |
||||
├── pipeline.py # Pipeline control primitives |
||||
├── analysis.py # Analysis primitives |
||||
├── reports.py # Report generation |
||||
└── content.py # Content validation primitives |
||||
|
||||
tests/agent_tools/ # New test directory |
||||
├── __init__.py |
||||
├── test_database_tools.py |
||||
├── test_pipeline_tools.py |
||||
├── test_analysis_tools.py |
||||
├── test_content_tools.py |
||||
└── test_parity.py |
||||
|
||||
reports/ # Agent-generated reports (gitignored) |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## System-Wide Impact |
||||
|
||||
- **Interaction graph:** Agent tools call into `database.py`, `pipeline/`, `analysis/`, `health/` modules. These modules are already well-factored and read-only where appropriate. |
||||
- **Error propagation:** Agent tools return structured errors (JSON with `error`, `suggestion`, `retryable` fields) rather than raising exceptions. This lets the agent reason about failures. |
||||
- **State lifecycle:** Agent-generated reports in `reports/` are ephemeral (gitignored). Agent updates to `context.md` are durable and committed. |
||||
- **Unchanged invariants:** The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor. |
||||
|
||||
--- |
||||
|
||||
## Risks & Dependencies |
||||
|
||||
| Risk | Mitigation | |
||||
|------|-----------| |
||||
| DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. | |
||||
| Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. | |
||||
| Context injection grows too large | Context is scoped to the task. `context.py` generates minimal relevant context, not full DB dumps. | |
||||
| Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. | |
||||
|
||||
--- |
||||
|
||||
## Documentation / Operational Notes |
||||
|
||||
- Add `agent_tools/` to `AGENTS.md` so future agents know the capability surface exists |
||||
- Document the parity test matrix in `tests/agent_tools/README.md` |
||||
- `reports/` should be gitignored; agent reports are ephemeral outputs |
||||
|
||||
--- |
||||
|
||||
## Sources & References |
||||
|
||||
- **Origin:** STRATEGY.md (agent-native architecture track) |
||||
- **Skill:** `ce-agent-native-architecture` (parity, granularity, composability, emergent capability) |
||||
- **Related code:** `health/`, `pipeline/`, `analysis/`, `database.py` |
||||
- **Related docs:** `docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md` (P4 tracks) |
||||
@ -0,0 +1,74 @@ |
||||
"""Tests for agent analysis and report generation primitives.""" |
||||
|
||||
import pytest |
||||
import os |
||||
|
||||
pytest.importorskip("duckdb") |
||||
|
||||
|
||||
class TestAnalyzePartyShift: |
||||
def test_returns_shift_data(self, tmp_duckdb_path): |
||||
from agent_tools.analysis import analyze_party_shift |
||||
|
||||
result = analyze_party_shift( |
||||
tmp_duckdb_path, party="VVD", window_start="2020", window_end="2024" |
||||
) |
||||
assert isinstance(result, dict) |
||||
assert "party" in result |
||||
assert "shift" in result or "error" in result |
||||
|
||||
def test_nonexistent_party_returns_error(self, tmp_duckdb_path): |
||||
from agent_tools.analysis import analyze_party_shift |
||||
|
||||
result = analyze_party_shift( |
||||
tmp_duckdb_path, party="FAKE", window_start="2020", window_end="2024" |
||||
) |
||||
assert isinstance(result, dict) |
||||
|
||||
|
||||
class TestAnalyzeAxisStability: |
||||
def test_returns_stability_scores(self, tmp_duckdb_path): |
||||
from agent_tools.analysis import analyze_axis_stability |
||||
|
||||
result = analyze_axis_stability(tmp_duckdb_path, component=1, windows=["2020", "2024"]) |
||||
assert isinstance(result, dict) |
||||
assert "component" in result |
||||
assert "stability" in result or "error" in result |
||||
|
||||
|
||||
class TestGenerateReport: |
||||
def test_writes_markdown_file(self, tmp_duckdb_path, tmp_path): |
||||
from agent_tools.reports import generate_report |
||||
|
||||
output_path = str(tmp_path / "report.md") |
||||
result = generate_report( |
||||
tmp_duckdb_path, |
||||
report_type="summary", |
||||
parameters={}, |
||||
output_path=output_path, |
||||
) |
||||
assert isinstance(result, dict) |
||||
assert os.path.exists(output_path) |
||||
|
||||
def test_returns_error_for_unknown_type(self, tmp_duckdb_path, tmp_path): |
||||
from agent_tools.reports import generate_report |
||||
|
||||
output_path = str(tmp_path / "report.md") |
||||
result = generate_report( |
||||
tmp_duckdb_path, |
||||
report_type="unknown", |
||||
parameters={}, |
||||
output_path=output_path, |
||||
) |
||||
assert isinstance(result, dict) |
||||
assert "error" in result |
||||
|
||||
|
||||
class TestValidateSvdLabels: |
||||
def test_returns_validation_result(self, tmp_duckdb_path): |
||||
from agent_tools.analysis import validate_svd_labels |
||||
|
||||
result = validate_svd_labels(tmp_duckdb_path, component=1) |
||||
assert isinstance(result, dict) |
||||
assert "component" in result |
||||
assert "valid" in result or "error" in result |
||||
@ -0,0 +1,44 @@ |
||||
"""Tests for agent content validation primitives.""" |
||||
|
||||
import pytest |
||||
|
||||
pytest.importorskip("duckdb") |
||||
|
||||
|
||||
class TestValidateMotionCoverage: |
||||
def test_returns_coverage_gaps(self, tmp_duckdb_path): |
||||
from agent_tools.content import validate_motion_coverage |
||||
|
||||
result = validate_motion_coverage(tmp_duckdb_path, start_date="2024-01-01", end_date="2024-12-31") |
||||
assert isinstance(result, dict) |
||||
assert "gaps" in result |
||||
assert "coverage_rate" in result or "error" in result |
||||
|
||||
|
||||
class TestValidateLaymanExplanations: |
||||
def test_returns_quality_report(self, tmp_duckdb_path): |
||||
from agent_tools.content import validate_layman_explanations |
||||
|
||||
result = validate_layman_explanations(tmp_duckdb_path, sample_size=5) |
||||
assert isinstance(result, dict) |
||||
assert "sample_size" in result |
||||
assert "coverage" in result or "error" in result |
||||
|
||||
|
||||
class TestSuggestSvdLabel: |
||||
def test_returns_suggestion(self, tmp_duckdb_path): |
||||
from agent_tools.content import suggest_svd_label |
||||
|
||||
result = suggest_svd_label(tmp_duckdb_path, component=1, top_n=5) |
||||
assert isinstance(result, dict) |
||||
assert "component" in result |
||||
assert "suggestion" in result or "error" in result |
||||
|
||||
|
||||
class TestCheckEmbeddingQuality: |
||||
def test_returns_coverage_stats(self, tmp_duckdb_path): |
||||
from agent_tools.content import check_embedding_quality |
||||
|
||||
result = check_embedding_quality(tmp_duckdb_path, window_id="current_parliament") |
||||
assert isinstance(result, dict) |
||||
assert "coverage" in result or "error" in result |
||||
@ -0,0 +1,75 @@ |
||||
"""Tests for agent database query primitives.""" |
||||
|
||||
import pytest |
||||
import json |
||||
|
||||
pytest.importorskip("duckdb") |
||||
|
||||
|
||||
class TestQueryMotions: |
||||
def test_returns_motion_rows(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_motions |
||||
|
||||
result = query_motions(tmp_duckdb_path) |
||||
assert isinstance(result, list) |
||||
|
||||
def test_respects_limit(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_motions |
||||
|
||||
result = query_motions(tmp_duckdb_path, limit=5) |
||||
assert len(result) <= 5 |
||||
|
||||
def test_empty_db_returns_empty_list(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_motions |
||||
|
||||
result = query_motions(tmp_duckdb_path) |
||||
assert result == [] |
||||
|
||||
|
||||
class TestQueryVotes: |
||||
def test_returns_vote_counts(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_votes |
||||
|
||||
result = query_votes(tmp_duckdb_path, motion_id=1) |
||||
assert isinstance(result, list) |
||||
|
||||
def test_filters_by_party(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_votes |
||||
|
||||
result = query_votes(tmp_duckdb_path, motion_id=1, party="VVD") |
||||
assert isinstance(result, list) |
||||
|
||||
|
||||
class TestQuerySvdVectors: |
||||
def test_returns_vectors(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_svd_vectors |
||||
|
||||
result = query_svd_vectors(tmp_duckdb_path, window_id="current_parliament") |
||||
assert isinstance(result, list) |
||||
|
||||
def test_filters_by_entity_type(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_svd_vectors |
||||
|
||||
result = query_svd_vectors( |
||||
tmp_duckdb_path, window_id="current_parliament", entity_type="mp" |
||||
) |
||||
assert isinstance(result, list) |
||||
|
||||
|
||||
class TestQueryPartyPositions: |
||||
def test_returns_party_scores(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_party_positions |
||||
|
||||
result = query_party_positions(tmp_duckdb_path, window_id="current_parliament") |
||||
assert isinstance(result, list) |
||||
|
||||
|
||||
class TestQueryPipelineStatus: |
||||
def test_returns_status_dict(self, tmp_duckdb_path): |
||||
from agent_tools.database import query_pipeline_status |
||||
|
||||
result = query_pipeline_status(tmp_duckdb_path) |
||||
assert isinstance(result, dict) |
||||
assert "motion_count" in result |
||||
assert "latest_motion_date" in result |
||||
assert "svd_window_count" in result |
||||
@ -0,0 +1,160 @@ |
||||
"""Parity tests: verify agent tools can achieve what humans can. |
||||
|
||||
These tests ensure the agent-native architecture satisfies the parity principle: |
||||
"Whatever the user can do through the UI/scripts, the agent can achieve through tools." |
||||
""" |
||||
|
||||
import os |
||||
import pytest |
||||
|
||||
pytest.importorskip("duckdb") |
||||
|
||||
|
||||
class TestDatabaseParity: |
||||
"""Agent database queries vs human SQL queries.""" |
||||
|
||||
def test_agent_query_motions_matches_raw_sql(self, tmp_duckdb_path): |
||||
"""Human: SELECT * FROM motions LIMIT 10 |
||||
Agent: query_motions(db_path, limit=10) |
||||
""" |
||||
import duckdb |
||||
from agent_tools.database import query_motions |
||||
|
||||
# Human approach — handle empty DB gracefully |
||||
con = duckdb.connect(tmp_duckdb_path) |
||||
try: |
||||
human_result = con.execute("SELECT * FROM motions LIMIT 10").fetchdf().to_dict("records") |
||||
except Exception: |
||||
human_result = [] |
||||
con.close() |
||||
|
||||
# Agent approach |
||||
agent_result = query_motions(tmp_duckdb_path, limit=10) |
||||
|
||||
# Both should return lists |
||||
assert isinstance(human_result, list) |
||||
assert isinstance(agent_result, list) |
||||
assert len(agent_result) == len(human_result) |
||||
|
||||
def test_agent_pipeline_status_matches_raw_query(self, tmp_duckdb_path): |
||||
"""Human: SELECT COUNT(*) FROM motions |
||||
Agent: query_pipeline_status(db_path) |
||||
""" |
||||
import duckdb |
||||
from agent_tools.database import query_pipeline_status |
||||
|
||||
con = duckdb.connect(tmp_duckdb_path) |
||||
try: |
||||
human_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0] |
||||
except Exception: |
||||
human_count = 0 |
||||
con.close() |
||||
|
||||
agent_status = query_pipeline_status(tmp_duckdb_path) |
||||
|
||||
assert agent_status["motion_count"] == human_count |
||||
|
||||
|
||||
class TestHealthCheckParity: |
||||
"""Agent health check vs human script execution.""" |
||||
|
||||
def test_agent_health_check_matches_script(self, tmp_duckdb_path): |
||||
"""Human: python scripts/health_check.py |
||||
Agent: pipeline_check_health(db_path) |
||||
""" |
||||
from agent_tools.pipeline import pipeline_check_health |
||||
|
||||
# Agent approach |
||||
agent_result = pipeline_check_health(tmp_duckdb_path) |
||||
|
||||
assert isinstance(agent_result, dict) |
||||
assert "healthy" in agent_result |
||||
assert "checks" in agent_result |
||||
|
||||
|
||||
class TestReportGenerationParity: |
||||
"""Agent report generation vs human manual analysis.""" |
||||
|
||||
def test_agent_generates_summary_report(self, tmp_duckdb_path, tmp_path): |
||||
"""Human: Write a summary of pipeline state |
||||
Agent: generate_report(db_path, "summary", ...) |
||||
""" |
||||
from agent_tools.reports import generate_report |
||||
|
||||
output_path = str(tmp_path / "summary.md") |
||||
result = generate_report( |
||||
tmp_duckdb_path, |
||||
report_type="summary", |
||||
parameters={}, |
||||
output_path=output_path, |
||||
) |
||||
|
||||
assert result["status"] == "written" |
||||
assert os.path.exists(output_path) |
||||
|
||||
# Should contain key sections |
||||
content = open(output_path).read() |
||||
assert "Pipeline Summary" in content |
||||
assert "Motions in database" in content |
||||
|
||||
|
||||
class TestAnalysisParity: |
||||
"""Agent analysis vs human analytical queries.""" |
||||
|
||||
def test_agent_party_shift_analysis(self, tmp_duckdb_path): |
||||
"""Human: Write SQL to compare party positions across windows |
||||
Agent: analyze_party_shift(db_path, ...) |
||||
""" |
||||
from agent_tools.analysis import analyze_party_shift |
||||
|
||||
result = analyze_party_shift( |
||||
tmp_duckdb_path, |
||||
party="VVD", |
||||
window_start="2020", |
||||
window_end="2024", |
||||
) |
||||
|
||||
# Should return structured result (or error if no data) |
||||
assert isinstance(result, dict) |
||||
assert "party" in result |
||||
# Either shift data or error (empty DB is fine) |
||||
assert "shift" in result or "error" in result |
||||
|
||||
|
||||
class TestIntegrationAgentDiagnosticLoop: |
||||
"""Integration: Agent performs full diagnostic loop.""" |
||||
|
||||
def test_agent_diagnoses_stale_data(self, tmp_duckdb_path): |
||||
"""Agent loop: |
||||
1. Check health |
||||
2. Query pipeline status |
||||
3. Identify issue (empty DB = no data) |
||||
4. Suggest remediation |
||||
""" |
||||
from agent_tools.pipeline import pipeline_check_health |
||||
from agent_tools.database import query_pipeline_status |
||||
|
||||
# Step 1: Check health |
||||
health = pipeline_check_health(tmp_duckdb_path) |
||||
|
||||
# Step 2: Query status |
||||
status = query_pipeline_status(tmp_duckdb_path) |
||||
|
||||
# Step 3: Agent reasoning (simulated) |
||||
issues = [] |
||||
if status["motion_count"] == 0: |
||||
issues.append("No motions in database") |
||||
if status["svd_window_count"] == 0: |
||||
issues.append("No SVD windows computed") |
||||
|
||||
# Step 4: Suggest remediation |
||||
suggestions = [] |
||||
if "No motions in database" in issues: |
||||
suggestions.append("Run pipeline ingestion stage") |
||||
if "No SVD windows computed" in issues: |
||||
suggestions.append("Run SVD computation after ingestion") |
||||
|
||||
assert isinstance(issues, list) |
||||
assert isinstance(suggestions, list) |
||||
# Empty DB should produce actionable suggestions |
||||
assert len(suggestions) > 0 |
||||
@ -0,0 +1,59 @@ |
||||
"""Tests for agent pipeline control primitives.""" |
||||
|
||||
import pytest |
||||
|
||||
pytest.importorskip("duckdb") |
||||
|
||||
|
||||
class TestPipelineRunStage: |
||||
def test_dry_run_returns_planned_actions(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_run_stage |
||||
|
||||
result = pipeline_run_stage(tmp_duckdb_path, stage="svd", window_id="2024", dry_run=True) |
||||
assert isinstance(result, dict) |
||||
assert "stage" in result |
||||
assert result.get("dry_run") is True |
||||
|
||||
def test_invalid_stage_returns_error(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_run_stage |
||||
|
||||
result = pipeline_run_stage(tmp_duckdb_path, stage="invalid") |
||||
assert isinstance(result, dict) |
||||
assert "error" in result |
||||
|
||||
|
||||
class TestPipelineRunFull: |
||||
def test_dry_run_returns_plan(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_run_full |
||||
|
||||
result = pipeline_run_full(tmp_duckdb_path, dry_run=True) |
||||
assert isinstance(result, dict) |
||||
assert "stages" in result or "dry_run" in result |
||||
|
||||
|
||||
class TestPipelineCheckHealth: |
||||
def test_returns_health_report(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_check_health |
||||
|
||||
result = pipeline_check_health(tmp_duckdb_path) |
||||
assert isinstance(result, dict) |
||||
assert "checks" in result |
||||
assert "healthy" in result |
||||
|
||||
|
||||
class TestPipelineGetLogs: |
||||
def test_returns_log_lines(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_get_logs |
||||
|
||||
result = pipeline_get_logs(tmp_duckdb_path, stage="svd", lines=10) |
||||
assert isinstance(result, list) |
||||
assert len(result) <= 10 |
||||
|
||||
|
||||
class TestPipelineValidateOutput: |
||||
def test_validates_stage_output(self, tmp_duckdb_path): |
||||
from agent_tools.pipeline import pipeline_validate_output |
||||
|
||||
result = pipeline_validate_output(tmp_duckdb_path, stage="svd") |
||||
assert isinstance(result, dict) |
||||
assert "valid" in result |
||||
Loading…
Reference in new issue