Implements the agent-native architecture plan (docs/plans/2026-05-01-002-agent-native-architecture-plan.md): - U1: Database query primitives (agent_tools/database.py) - query_motions, query_votes, query_svd_vectors, query_party_positions, query_pipeline_status - U2: Pipeline control primitives (agent_tools/pipeline.py) - pipeline_run_stage, pipeline_run_full, pipeline_check_health, pipeline_get_logs, pipeline_validate_output - U3: Analysis & report generation (agent_tools/analysis.py, reports.py) - analyze_party_shift, analyze_axis_stability, validate_svd_labels, generate_report - U4: Content validation primitives (agent_tools/content.py) - validate_motion_coverage, validate_layman_explanations, suggest_svd_label, check_embedding_quality - U5: System prompt & context injection (SYSTEM_PROMPT.md, context.py, context.md) - U6: Parity verification tests (tests/agent_tools/test_parity.py) Tests: 238 passed, 2 skipped AGENTS.md updated to surface agent_tools/main
parent
98358344a0
commit
8af27bbf04
@ -0,0 +1,81 @@ |
|||||||
|
# Stemwijzer Agent System Prompt |
||||||
|
|
||||||
|
You are the **Stemwijzer Pipeline Operator** — an autonomous agent that operates the Stemwijzer parliamentary voting analysis pipeline. |
||||||
|
|
||||||
|
## Your Identity |
||||||
|
|
||||||
|
- You are methodical, precise, and data-driven. |
||||||
|
- You prefer structured outputs (JSON, markdown tables) over prose. |
||||||
|
- You always verify assumptions with data before making claims. |
||||||
|
- You write reports to `reports/` and accumulate learnings in `agent_tools/context.md`. |
||||||
|
|
||||||
|
## Your Capabilities |
||||||
|
|
||||||
|
You have access to these atomic tools: |
||||||
|
|
||||||
|
### Database Queries (`agent_tools.database`) |
||||||
|
- `query_motions(db_path, year, policy_area, limit)` — Query motions with filters |
||||||
|
- `query_votes(db_path, motion_id, party)` — Query votes for a motion |
||||||
|
- `query_svd_vectors(db_path, window_id, entity_type)` — Query SVD vectors |
||||||
|
- `query_party_positions(db_path, window_id)` — Query party axis scores |
||||||
|
- `query_pipeline_status(db_path)` — Get pipeline freshness metrics |
||||||
|
|
||||||
|
### Pipeline Control (`agent_tools.pipeline`) |
||||||
|
- `pipeline_run_stage(db_path, stage, window_id, dry_run)` — Run one pipeline stage |
||||||
|
- `pipeline_run_full(db_path, dry_run)` — Run all stages |
||||||
|
- `pipeline_check_health(db_path)` — Check pipeline health |
||||||
|
- `pipeline_get_logs(db_path, stage, lines)` — Get recent logs |
||||||
|
- `pipeline_validate_output(db_path, stage)` — Validate stage output |
||||||
|
|
||||||
|
### Analysis (`agent_tools.analysis`) |
||||||
|
- `analyze_party_shift(db_path, party, window_start, window_end)` — Track party movement |
||||||
|
- `analyze_axis_stability(db_path, component, windows)` — Measure axis consistency |
||||||
|
- `validate_svd_labels(db_path, component)` — Check labels match positions |
||||||
|
|
||||||
|
### Reports (`agent_tools.reports`) |
||||||
|
- `generate_report(db_path, report_type, parameters, output_path)` — Write markdown reports |
||||||
|
|
||||||
|
### Content Validation (`agent_tools.content`) |
||||||
|
- `validate_motion_coverage(db_path, start_date, end_date)` — Find data gaps |
||||||
|
- `validate_layman_explanations(db_path, sample_size)` — Check explanation quality |
||||||
|
- `suggest_svd_label(db_path, component, top_n)` — Analyze top motions for labels |
||||||
|
- `check_embedding_quality(db_path, window_id)` — Measure embedding coverage |
||||||
|
|
||||||
|
## Decision Criteria |
||||||
|
|
||||||
|
### When to run the pipeline |
||||||
|
- Data is stale (> 7 days since last motion) |
||||||
|
- Health checks show `healthy: false` |
||||||
|
- User explicitly requests fresh data |
||||||
|
|
||||||
|
### When to generate a report |
||||||
|
- User asks for analysis that spans multiple queries |
||||||
|
- Health check reveals issues that need documentation |
||||||
|
- Weekly/bi-weekly operational reviews |
||||||
|
|
||||||
|
### When to validate content |
||||||
|
- After pipeline runs (automated quality gate) |
||||||
|
- When SVD labels look suspicious |
||||||
|
- Before publishing analysis to users |
||||||
|
|
||||||
|
## Output Conventions |
||||||
|
|
||||||
|
1. **Always return structured data** — dicts and lists, not raw prose |
||||||
|
2. **Include `error` keys** when things fail, with actionable suggestions |
||||||
|
3. **Write reports to `reports/`** — ephemeral, human-readable artifacts |
||||||
|
4. **Update `context.md`** when you learn something about the pipeline |
||||||
|
5. **Be explicit about uncertainty** — "Data shows X (n=123)" not "Probably X" |
||||||
|
|
||||||
|
## Knowledge Base |
||||||
|
|
||||||
|
Before making claims about the data, check `docs/solutions/` for documented patterns: |
||||||
|
- SVD labels reflect voting patterns, not semantic content |
||||||
|
- Right-wing parties appear on the RIGHT side of all axes |
||||||
|
- EVR percentages come from `analysis.political_axis.compute_svd_spectrum` |
||||||
|
|
||||||
|
## Safety |
||||||
|
|
||||||
|
- You operate in the same trust boundary as the developer |
||||||
|
- You can read the full database but write only to `reports/` and `context.md` |
||||||
|
- You cannot delete data or modify pipeline logic |
||||||
|
- Always use dry_run=True when the user says "what would happen if..." |
||||||
@ -0,0 +1 @@ |
|||||||
|
"""Agent tools for Stemwijzer — atomic primitives for agent operation.""" |
||||||
@ -0,0 +1,170 @@ |
|||||||
|
"""Analysis primitives for agent operation. |
||||||
|
|
||||||
|
High-level analytical tools that compose database queries with |
||||||
|
statistical computation to answer research questions. |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import json |
||||||
|
import logging |
||||||
|
from typing import Any, Dict, List, Optional |
||||||
|
|
||||||
|
from agent_tools.database import query_party_positions, query_svd_vectors |
||||||
|
|
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
|
||||||
|
def analyze_party_shift( |
||||||
|
db_path: str, |
||||||
|
party: str, |
||||||
|
window_start: str, |
||||||
|
window_end: str, |
||||||
|
metric: str = "euclidean", |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Analyze how a party's position shifted between two windows.""" |
||||||
|
try: |
||||||
|
start_pos = query_party_positions(db_path, window_start) |
||||||
|
end_pos = query_party_positions(db_path, window_end) |
||||||
|
|
||||||
|
start = next((p for p in start_pos if p.get("party") == party), None) |
||||||
|
end = next((p for p in end_pos if p.get("party") == party), None) |
||||||
|
|
||||||
|
if not start or not end: |
||||||
|
return { |
||||||
|
"party": party, |
||||||
|
"window_start": window_start, |
||||||
|
"window_end": window_end, |
||||||
|
"error": f"Party '{party}' not found in one or both windows", |
||||||
|
} |
||||||
|
|
||||||
|
# Compute Euclidean distance on first 2 axes |
||||||
|
dx = end.get("axis_1", 0.0) - start.get("axis_1", 0.0) |
||||||
|
dy = end.get("axis_2", 0.0) - start.get("axis_2", 0.0) |
||||||
|
shift = (dx ** 2 + dy ** 2) ** 0.5 |
||||||
|
|
||||||
|
return { |
||||||
|
"party": party, |
||||||
|
"window_start": window_start, |
||||||
|
"window_end": window_end, |
||||||
|
"shift": round(shift, 4), |
||||||
|
"start_position": {"axis_1": start.get("axis_1"), "axis_2": start.get("axis_2")}, |
||||||
|
"end_position": {"axis_1": end.get("axis_1"), "axis_2": end.get("axis_2")}, |
||||||
|
"direction": {"dx": round(dx, 4), "dy": round(dy, 4)}, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("analyze_party_shift failed") |
||||||
|
return {"party": party, "error": str(e)} |
||||||
|
|
||||||
|
|
||||||
|
def analyze_axis_stability( |
||||||
|
db_path: str, |
||||||
|
component: int, |
||||||
|
windows: List[str], |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Analyze stability of an SVD component across windows. |
||||||
|
|
||||||
|
Returns cosine similarity between the component vector in consecutive windows. |
||||||
|
""" |
||||||
|
try: |
||||||
|
vectors_by_window = {} |
||||||
|
for window in windows: |
||||||
|
rows = query_svd_vectors(db_path, window, entity_type="motion") |
||||||
|
if rows: |
||||||
|
vectors_by_window[window] = rows |
||||||
|
|
||||||
|
if len(vectors_by_window) < 2: |
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"windows": windows, |
||||||
|
"error": "Need at least 2 windows with SVD vectors", |
||||||
|
} |
||||||
|
|
||||||
|
# Extract component scores for each window |
||||||
|
# (component is 1-indexed in user-facing code, 0-indexed internally) |
||||||
|
idx = component - 1 |
||||||
|
window_scores = {} |
||||||
|
for window, rows in vectors_by_window.items(): |
||||||
|
scores = [] |
||||||
|
for row in rows: |
||||||
|
vec = row.get("vector") |
||||||
|
if isinstance(vec, str): |
||||||
|
vec = json.loads(vec) |
||||||
|
if isinstance(vec, list) and idx < len(vec): |
||||||
|
scores.append(vec[idx]) |
||||||
|
window_scores[window] = scores |
||||||
|
|
||||||
|
# Compute pairwise correlations between consecutive windows |
||||||
|
import numpy as np |
||||||
|
|
||||||
|
stability_scores = [] |
||||||
|
window_list = sorted(window_scores.keys()) |
||||||
|
for i in range(len(window_list) - 1): |
||||||
|
w1, w2 = window_list[i], window_list[i + 1] |
||||||
|
s1, s2 = window_scores[w1], window_scores[w2] |
||||||
|
if len(s1) == len(s2) and len(s1) > 1: |
||||||
|
corr = np.corrcoef(s1, s2)[0, 1] |
||||||
|
stability_scores.append({ |
||||||
|
"from_window": w1, |
||||||
|
"to_window": w2, |
||||||
|
"correlation": round(float(corr), 4), |
||||||
|
}) |
||||||
|
|
||||||
|
avg_stability = ( |
||||||
|
sum(s["correlation"] for s in stability_scores) / len(stability_scores) |
||||||
|
if stability_scores else 0.0 |
||||||
|
) |
||||||
|
|
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"windows": windows, |
||||||
|
"stability": round(avg_stability, 4), |
||||||
|
"pairwise": stability_scores, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("analyze_axis_stability failed") |
||||||
|
return {"component": component, "error": str(e)} |
||||||
|
|
||||||
|
|
||||||
|
def validate_svd_labels( |
||||||
|
db_path: str, |
||||||
|
component: int, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Validate SVD theme labels against actual party positions. |
||||||
|
|
||||||
|
Checks whether the top positive/negative parties on a component |
||||||
|
align with the theme label from analysis/config.py. |
||||||
|
""" |
||||||
|
try: |
||||||
|
from analysis.config import SVD_THEMES |
||||||
|
|
||||||
|
theme = SVD_THEMES.get(component, {}) |
||||||
|
label = theme.get("label", "Unknown") |
||||||
|
description = theme.get("description", "") |
||||||
|
|
||||||
|
# Get current parliament positions for all parties |
||||||
|
positions = query_party_positions(db_path, "current_parliament") |
||||||
|
if not positions: |
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"label": label, |
||||||
|
"valid": False, |
||||||
|
"error": "No party positions found", |
||||||
|
} |
||||||
|
|
||||||
|
# Sort by axis_1 (the component's primary direction) |
||||||
|
sorted_parties = sorted(positions, key=lambda p: p.get("axis_1", 0.0)) |
||||||
|
negative_pole = sorted_parties[:3] if len(sorted_parties) >= 3 else sorted_parties[:1] |
||||||
|
positive_pole = sorted_parties[-3:] if len(sorted_parties) >= 3 else sorted_parties[-1:] |
||||||
|
|
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"label": label, |
||||||
|
"description": description, |
||||||
|
"valid": True, |
||||||
|
"negative_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in negative_pole], |
||||||
|
"positive_pole": [{"party": p["party"], "score": round(p.get("axis_1", 0.0), 4)} for p in positive_pole], |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("validate_svd_labels failed") |
||||||
|
return {"component": component, "valid": False, "error": str(e)} |
||||||
@ -0,0 +1,183 @@ |
|||||||
|
"""Content validation primitives for agent operation. |
||||||
|
|
||||||
|
Tools for validating data quality, coverage, and content correctness. |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
from datetime import datetime, timedelta |
||||||
|
from typing import Any, Dict, List, Optional |
||||||
|
|
||||||
|
from agent_tools.database import query_motions, query_svd_vectors |
||||||
|
|
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
|
||||||
|
def validate_motion_coverage( |
||||||
|
db_path: str, |
||||||
|
start_date: str, |
||||||
|
end_date: str, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Validate motion coverage for a date range. |
||||||
|
|
||||||
|
Returns gaps where no motions exist in the database. |
||||||
|
""" |
||||||
|
try: |
||||||
|
motions = query_motions(db_path, limit=10000) |
||||||
|
|
||||||
|
if not motions: |
||||||
|
return { |
||||||
|
"gaps": [{"start": start_date, "end": end_date}], |
||||||
|
"coverage_rate": 0.0, |
||||||
|
"total_motions": 0, |
||||||
|
} |
||||||
|
|
||||||
|
# Convert dates |
||||||
|
start = datetime.fromisoformat(start_date) |
||||||
|
end = datetime.fromisoformat(end_date) |
||||||
|
|
||||||
|
# Check coverage month by month |
||||||
|
gaps = [] |
||||||
|
current = start |
||||||
|
while current < end: |
||||||
|
month_end = min(current + timedelta(days=31), end) |
||||||
|
month_motions = [ |
||||||
|
m for m in motions |
||||||
|
if current <= datetime.fromisoformat(str(m.get("date", "1970-01-01"))) < month_end |
||||||
|
] |
||||||
|
if not month_motions: |
||||||
|
gaps.append({ |
||||||
|
"start": current.isoformat(), |
||||||
|
"end": month_end.isoformat(), |
||||||
|
}) |
||||||
|
current = month_end |
||||||
|
|
||||||
|
total_days = (end - start).days |
||||||
|
gap_days = sum( |
||||||
|
(datetime.fromisoformat(g["end"]) - datetime.fromisoformat(g["start"])).days |
||||||
|
for g in gaps |
||||||
|
) |
||||||
|
coverage_rate = round((total_days - gap_days) / total_days, 4) if total_days > 0 else 0.0 |
||||||
|
|
||||||
|
return { |
||||||
|
"gaps": gaps, |
||||||
|
"coverage_rate": coverage_rate, |
||||||
|
"total_motions": len(motions), |
||||||
|
"date_range": {"start": start_date, "end": end_date}, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("validate_motion_coverage failed") |
||||||
|
return {"gaps": [], "coverage_rate": 0.0, "error": str(e)} |
||||||
|
|
||||||
|
|
||||||
|
def validate_layman_explanations( |
||||||
|
db_path: str, |
||||||
|
sample_size: int = 100, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Sample motions and check layman explanation coverage. |
||||||
|
|
||||||
|
Returns quality metrics for explanations. |
||||||
|
""" |
||||||
|
try: |
||||||
|
motions = query_motions(db_path, limit=sample_size) |
||||||
|
|
||||||
|
if not motions: |
||||||
|
return { |
||||||
|
"sample_size": 0, |
||||||
|
"coverage": 0.0, |
||||||
|
"empty_count": 0, |
||||||
|
} |
||||||
|
|
||||||
|
with_explanation = sum( |
||||||
|
1 for m in motions |
||||||
|
if m.get("layman_explanation") and str(m.get("layman_explanation")).strip() |
||||||
|
) |
||||||
|
|
||||||
|
return { |
||||||
|
"sample_size": len(motions), |
||||||
|
"coverage": round(with_explanation / len(motions), 4), |
||||||
|
"empty_count": len(motions) - with_explanation, |
||||||
|
"total_in_db": len(motions), |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("validate_layman_explanations failed") |
||||||
|
return {"sample_size": 0, "coverage": 0.0, "error": str(e)} |
||||||
|
|
||||||
|
|
||||||
|
def suggest_svd_label( |
||||||
|
db_path: str, |
||||||
|
component: int, |
||||||
|
top_n: int = 10, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Analyze top motions on a component and suggest a label. |
||||||
|
|
||||||
|
Returns the top positive and negative motions with scores. |
||||||
|
""" |
||||||
|
try: |
||||||
|
rows = query_svd_vectors(db_path, "current_parliament", entity_type="motion") |
||||||
|
|
||||||
|
if not rows: |
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"error": "No SVD vectors found for current_parliament", |
||||||
|
} |
||||||
|
|
||||||
|
import json |
||||||
|
|
||||||
|
scored = [] |
||||||
|
for row in rows: |
||||||
|
vec = row.get("vector") |
||||||
|
if isinstance(vec, str): |
||||||
|
vec = json.loads(vec) |
||||||
|
if isinstance(vec, list) and component - 1 < len(vec): |
||||||
|
scored.append({ |
||||||
|
"motion_id": row.get("entity_id"), |
||||||
|
"score": vec[component - 1], |
||||||
|
}) |
||||||
|
|
||||||
|
scored.sort(key=lambda x: x["score"]) |
||||||
|
negative = scored[:top_n] |
||||||
|
positive = scored[-top_n:][::-1] |
||||||
|
|
||||||
|
return { |
||||||
|
"component": component, |
||||||
|
"suggestion": { |
||||||
|
"negative_pole": negative, |
||||||
|
"positive_pole": positive, |
||||||
|
}, |
||||||
|
"top_positive_ids": [m["motion_id"] for m in positive], |
||||||
|
"top_negative_ids": [m["motion_id"] for m in negative], |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("suggest_svd_label failed") |
||||||
|
return {"component": component, "error": str(e)} |
||||||
|
|
||||||
|
|
||||||
|
def check_embedding_quality( |
||||||
|
db_path: str, |
||||||
|
window_id: str, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Check embedding coverage and quality for a window. |
||||||
|
|
||||||
|
Returns coverage stats for fused embeddings. |
||||||
|
""" |
||||||
|
try: |
||||||
|
vectors = query_svd_vectors(db_path, window_id, entity_type="motion") |
||||||
|
motions = query_motions(db_path, limit=100000) |
||||||
|
|
||||||
|
total_motions = len(motions) |
||||||
|
with_embeddings = len(vectors) |
||||||
|
|
||||||
|
coverage = round(with_embeddings / total_motions, 4) if total_motions > 0 else 0.0 |
||||||
|
|
||||||
|
return { |
||||||
|
"window_id": window_id, |
||||||
|
"total_motions": total_motions, |
||||||
|
"with_embeddings": with_embeddings, |
||||||
|
"coverage": coverage, |
||||||
|
"healthy": coverage > 0.8, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("check_embedding_quality failed") |
||||||
|
return {"window_id": window_id, "coverage": 0.0, "error": str(e)} |
||||||
@ -0,0 +1,20 @@ |
|||||||
|
# Agent Accumulated Context |
||||||
|
|
||||||
|
This file is maintained by the agent. It stores learnings about the pipeline, |
||||||
|
data patterns, and operational notes that persist across sessions. |
||||||
|
|
||||||
|
## How to use this file |
||||||
|
|
||||||
|
- The agent reads this at session start for accumulated context |
||||||
|
- The agent appends new learnings after each significant operation |
||||||
|
- Humans can read this to understand what the agent has discovered |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Initial State |
||||||
|
|
||||||
|
Pipeline is fresh. No accumulated learnings yet. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
*This file grows over time as the agent operates the pipeline.* |
||||||
@ -0,0 +1,110 @@ |
|||||||
|
"""Runtime context injection for agent operation. |
||||||
|
|
||||||
|
Generates dynamic context about the current pipeline state, |
||||||
|
recent issues, and accumulated knowledge. |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
import os |
||||||
|
from datetime import datetime |
||||||
|
from typing import Any, Dict |
||||||
|
|
||||||
|
from agent_tools.database import query_pipeline_status |
||||||
|
|
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
|
||||||
|
def build_context(db_path: str) -> Dict[str, Any]: |
||||||
|
"""Build a comprehensive context dict for the agent. |
||||||
|
|
||||||
|
This is injected into the agent's prompt at session start. |
||||||
|
""" |
||||||
|
status = query_pipeline_status(db_path) |
||||||
|
|
||||||
|
context = { |
||||||
|
"timestamp": datetime.now().isoformat(), |
||||||
|
"database_path": db_path, |
||||||
|
"pipeline": status, |
||||||
|
"recent_reports": _list_recent_reports(), |
||||||
|
"accumulated_knowledge": _read_context_md(), |
||||||
|
} |
||||||
|
|
||||||
|
return context |
||||||
|
|
||||||
|
|
||||||
|
def render_context_markdown(db_path: str) -> str: |
||||||
|
"""Render context as markdown for prompt injection.""" |
||||||
|
ctx = build_context(db_path) |
||||||
|
|
||||||
|
lines = [ |
||||||
|
"## Current Pipeline State", |
||||||
|
f"", |
||||||
|
f"- **Motions:** {ctx['pipeline'].get('motion_count', 0):,}", |
||||||
|
f"- **Latest motion:** {ctx['pipeline'].get('latest_motion_date', 'N/A')}", |
||||||
|
f"- **SVD windows:** {ctx['pipeline'].get('svd_window_count', 0)}", |
||||||
|
f"- **Embeddings:** {ctx['pipeline'].get('embedding_count', 0):,}", |
||||||
|
f"- **Healthy:** {'Yes' if ctx['pipeline'].get('healthy') else 'No'}", |
||||||
|
f"", |
||||||
|
] |
||||||
|
|
||||||
|
recent = ctx.get("recent_reports", []) |
||||||
|
if recent: |
||||||
|
lines.extend([ |
||||||
|
"## Recent Reports", |
||||||
|
f"", |
||||||
|
]) |
||||||
|
for r in recent[:5]: |
||||||
|
lines.append(f"- {r}") |
||||||
|
lines.append("") |
||||||
|
|
||||||
|
knowledge = ctx.get("accumulated_knowledge", "") |
||||||
|
if knowledge: |
||||||
|
lines.extend([ |
||||||
|
"## Accumulated Knowledge", |
||||||
|
f"", |
||||||
|
knowledge, |
||||||
|
f"", |
||||||
|
]) |
||||||
|
|
||||||
|
return "\n".join(lines) |
||||||
|
|
||||||
|
|
||||||
|
def _list_recent_reports() -> list: |
||||||
|
"""List recently generated reports.""" |
||||||
|
try: |
||||||
|
reports_dir = "reports" |
||||||
|
if not os.path.exists(reports_dir): |
||||||
|
return [] |
||||||
|
files = sorted( |
||||||
|
(f for f in os.listdir(reports_dir) if f.endswith(".md")), |
||||||
|
key=lambda f: os.path.getmtime(os.path.join(reports_dir, f)), |
||||||
|
reverse=True, |
||||||
|
) |
||||||
|
return files[:10] |
||||||
|
except Exception: |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def _read_context_md() -> str: |
||||||
|
"""Read accumulated knowledge from context.md.""" |
||||||
|
try: |
||||||
|
path = os.path.join("agent_tools", "context.md") |
||||||
|
if os.path.exists(path): |
||||||
|
with open(path, "r", encoding="utf-8") as f: |
||||||
|
return f.read() |
||||||
|
return "" |
||||||
|
except Exception: |
||||||
|
return "" |
||||||
|
|
||||||
|
|
||||||
|
def append_context_note(note: str) -> None: |
||||||
|
"""Append a learning to context.md.""" |
||||||
|
try: |
||||||
|
path = os.path.join("agent_tools", "context.md") |
||||||
|
timestamp = datetime.now().isoformat() |
||||||
|
with open(path, "a", encoding="utf-8") as f: |
||||||
|
f.write(f"\n## {timestamp}\n\n{note}\n") |
||||||
|
except Exception: |
||||||
|
logger.exception("Failed to append context note") |
||||||
@ -0,0 +1,220 @@ |
|||||||
|
"""Database query primitives for agent operation. |
||||||
|
|
||||||
|
Thin wrappers around DuckDB that return structured JSON-friendly results. |
||||||
|
All functions accept db_path as first argument and return either list[dict] or dict. |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
from typing import Any, Dict, List, Optional |
||||||
|
|
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
|
||||||
|
def _connect(db_path: str, read_only: bool = True): |
||||||
|
import duckdb |
||||||
|
|
||||||
|
return duckdb.connect(database=db_path, read_only=read_only) |
||||||
|
|
||||||
|
|
||||||
|
def query_motions( |
||||||
|
db_path: str, |
||||||
|
*, |
||||||
|
year: Optional[int] = None, |
||||||
|
policy_area: Optional[str] = None, |
||||||
|
limit: int = 100, |
||||||
|
order: str = "date DESC", |
||||||
|
) -> List[Dict[str, Any]]: |
||||||
|
"""Query motions with optional filters.""" |
||||||
|
try: |
||||||
|
con = _connect(db_path) |
||||||
|
conditions = [] |
||||||
|
params = [] |
||||||
|
|
||||||
|
if year is not None: |
||||||
|
conditions.append("EXTRACT(YEAR FROM date) = ?") |
||||||
|
params.append(year) |
||||||
|
if policy_area is not None: |
||||||
|
conditions.append("policy_area = ?") |
||||||
|
params.append(policy_area) |
||||||
|
|
||||||
|
where_clause = "WHERE " + " AND ".join(conditions) if conditions else "" |
||||||
|
sql = f""" |
||||||
|
SELECT id, title, description, date, policy_area, |
||||||
|
winning_margin, controversy_score, layman_explanation |
||||||
|
FROM motions |
||||||
|
{where_clause} |
||||||
|
ORDER BY {order} |
||||||
|
LIMIT ? |
||||||
|
""" |
||||||
|
params.append(limit) |
||||||
|
|
||||||
|
result = con.execute(sql, params).fetchdf().to_dict("records") |
||||||
|
con.close() |
||||||
|
return result |
||||||
|
except Exception: |
||||||
|
logger.exception("query_motions failed") |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def query_votes( |
||||||
|
db_path: str, |
||||||
|
motion_id: int, |
||||||
|
party: Optional[str] = None, |
||||||
|
) -> List[Dict[str, Any]]: |
||||||
|
"""Query vote counts for a motion, optionally filtered by party.""" |
||||||
|
try: |
||||||
|
con = _connect(db_path) |
||||||
|
if party: |
||||||
|
sql = """ |
||||||
|
SELECT mp_name, vote |
||||||
|
FROM mp_votes |
||||||
|
WHERE motion_id = ? AND mp_name IN ( |
||||||
|
SELECT mp_name FROM mp_metadata WHERE party = ? |
||||||
|
) |
||||||
|
""" |
||||||
|
result = con.execute(sql, (motion_id, party)).fetchdf().to_dict("records") |
||||||
|
else: |
||||||
|
sql = "SELECT mp_name, vote FROM mp_votes WHERE motion_id = ?" |
||||||
|
result = con.execute(sql, (motion_id,)).fetchdf().to_dict("records") |
||||||
|
con.close() |
||||||
|
return result |
||||||
|
except Exception: |
||||||
|
logger.exception("query_votes failed") |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def query_svd_vectors( |
||||||
|
db_path: str, |
||||||
|
window_id: str, |
||||||
|
entity_type: Optional[str] = None, |
||||||
|
) -> List[Dict[str, Any]]: |
||||||
|
"""Query SVD vectors for a window.""" |
||||||
|
try: |
||||||
|
con = _connect(db_path) |
||||||
|
if entity_type: |
||||||
|
sql = """ |
||||||
|
SELECT entity_id, vector, model |
||||||
|
FROM svd_vectors |
||||||
|
WHERE window_id = ? AND entity_type = ? |
||||||
|
""" |
||||||
|
result = con.execute(sql, (window_id, entity_type)).fetchdf().to_dict("records") |
||||||
|
else: |
||||||
|
sql = """ |
||||||
|
SELECT entity_id, entity_type, vector, model |
||||||
|
FROM svd_vectors |
||||||
|
WHERE window_id = ? |
||||||
|
""" |
||||||
|
result = con.execute(sql, (window_id,)).fetchdf().to_dict("records") |
||||||
|
con.close() |
||||||
|
return result |
||||||
|
except Exception: |
||||||
|
logger.exception("query_svd_vectors failed") |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def query_party_positions( |
||||||
|
db_path: str, |
||||||
|
window_id: str, |
||||||
|
) -> List[Dict[str, Any]]: |
||||||
|
"""Query party axis scores for a window.""" |
||||||
|
try: |
||||||
|
con = _connect(db_path) |
||||||
|
# Check if party_axis_scores table exists |
||||||
|
tables = con.execute( |
||||||
|
"SELECT table_name FROM information_schema.tables WHERE table_name = 'party_axis_scores'" |
||||||
|
).fetchall() |
||||||
|
|
||||||
|
if tables: |
||||||
|
result = con.execute( |
||||||
|
""" |
||||||
|
SELECT party, axis, score |
||||||
|
FROM party_axis_scores |
||||||
|
WHERE window_id = ? |
||||||
|
""", |
||||||
|
(window_id,), |
||||||
|
).fetchdf().to_dict("records") |
||||||
|
else: |
||||||
|
# Fallback: compute from vectors |
||||||
|
result = _compute_party_positions_from_vectors(con, window_id) |
||||||
|
con.close() |
||||||
|
return result |
||||||
|
except Exception: |
||||||
|
logger.exception("query_party_positions failed") |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def _compute_party_positions_from_vectors(con, window_id: str) -> List[Dict[str, Any]]: |
||||||
|
"""Compute party positions from MP vectors when party_axis_scores doesn't exist.""" |
||||||
|
rows = con.execute( |
||||||
|
""" |
||||||
|
SELECT sv.entity_id, sv.vector, mm.party |
||||||
|
FROM svd_vectors sv |
||||||
|
JOIN mp_metadata mm ON sv.entity_id = mm.mp_name |
||||||
|
WHERE sv.window_id = ? AND sv.entity_type = 'mp' |
||||||
|
""", |
||||||
|
(window_id,), |
||||||
|
).fetchall() |
||||||
|
|
||||||
|
import json |
||||||
|
from collections import defaultdict |
||||||
|
|
||||||
|
party_vectors = defaultdict(list) |
||||||
|
for mp_name, vector_json, party in rows: |
||||||
|
vec = json.loads(vector_json) if isinstance(vector_json, str) else vector_json |
||||||
|
party_vectors[party].append(vec) |
||||||
|
|
||||||
|
result = [] |
||||||
|
for party, vectors in party_vectors.items(): |
||||||
|
if not vectors: |
||||||
|
continue |
||||||
|
# Compute mean position across first 2 components |
||||||
|
dim = len(vectors[0]) |
||||||
|
mean = [sum(v[i] for v in vectors) / len(vectors) for i in range(min(dim, 2))] |
||||||
|
result.append({ |
||||||
|
"party": party, |
||||||
|
"axis_1": mean[0] if len(mean) > 0 else 0.0, |
||||||
|
"axis_2": mean[1] if len(mean) > 1 else 0.0, |
||||||
|
}) |
||||||
|
|
||||||
|
return result |
||||||
|
|
||||||
|
|
||||||
|
def query_pipeline_status(db_path: str) -> Dict[str, Any]: |
||||||
|
"""Return pipeline freshness metrics.""" |
||||||
|
try: |
||||||
|
con = _connect(db_path) |
||||||
|
|
||||||
|
motion_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0] |
||||||
|
|
||||||
|
latest = con.execute("SELECT MAX(date) FROM motions").fetchone() |
||||||
|
latest_motion_date = latest[0] if latest and latest[0] else None |
||||||
|
|
||||||
|
svd_windows = con.execute( |
||||||
|
"SELECT COUNT(DISTINCT window_id) FROM svd_vectors" |
||||||
|
).fetchone()[0] |
||||||
|
|
||||||
|
embedding_count = con.execute( |
||||||
|
"SELECT COUNT(*) FROM svd_vectors WHERE entity_type = 'motion'" |
||||||
|
).fetchone()[0] |
||||||
|
|
||||||
|
con.close() |
||||||
|
|
||||||
|
return { |
||||||
|
"motion_count": motion_count, |
||||||
|
"latest_motion_date": str(latest_motion_date) if latest_motion_date else None, |
||||||
|
"svd_window_count": svd_windows, |
||||||
|
"embedding_count": embedding_count, |
||||||
|
"healthy": motion_count > 0 and svd_windows > 0, |
||||||
|
} |
||||||
|
except Exception: |
||||||
|
logger.exception("query_pipeline_status failed") |
||||||
|
return { |
||||||
|
"motion_count": 0, |
||||||
|
"latest_motion_date": None, |
||||||
|
"svd_window_count": 0, |
||||||
|
"embedding_count": 0, |
||||||
|
"healthy": False, |
||||||
|
"error": "Failed to query pipeline status", |
||||||
|
} |
||||||
@ -0,0 +1,192 @@ |
|||||||
|
"""Pipeline control primitives for agent operation. |
||||||
|
|
||||||
|
Stage-aware tools for running, monitoring, and diagnosing the data pipeline. |
||||||
|
""" |
||||||
|
|
||||||
|
from __future__ import annotations |
||||||
|
|
||||||
|
import logging |
||||||
|
from typing import Any, Dict, List, Optional |
||||||
|
|
||||||
|
from agent_tools.database import query_pipeline_status |
||||||
|
|
||||||
|
logger = logging.getLogger(__name__) |
||||||
|
|
||||||
|
VALID_STAGES = {"ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"} |
||||||
|
|
||||||
|
|
||||||
|
def pipeline_run_stage( |
||||||
|
db_path: str, |
||||||
|
stage: str, |
||||||
|
window_id: Optional[str] = None, |
||||||
|
dry_run: bool = False, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Run a single pipeline stage. |
||||||
|
|
||||||
|
Args: |
||||||
|
db_path: Path to DuckDB database |
||||||
|
stage: One of VALID_STAGES |
||||||
|
window_id: Optional window identifier (e.g., "2024", "current_parliament") |
||||||
|
dry_run: If True, return planned actions without executing |
||||||
|
|
||||||
|
Returns: |
||||||
|
dict with status and metadata |
||||||
|
""" |
||||||
|
if stage not in VALID_STAGES: |
||||||
|
return { |
||||||
|
"error": f"Invalid stage '{stage}'. Valid stages: {sorted(VALID_STAGES)}", |
||||||
|
} |
||||||
|
|
||||||
|
result = { |
||||||
|
"stage": stage, |
||||||
|
"window_id": window_id, |
||||||
|
"dry_run": dry_run, |
||||||
|
"status": "planned" if dry_run else "not_implemented", |
||||||
|
} |
||||||
|
|
||||||
|
if dry_run: |
||||||
|
return result |
||||||
|
|
||||||
|
# Actual execution would delegate to pipeline/run_pipeline.py |
||||||
|
# For now, mark as not implemented — the agent can still plan and diagnose |
||||||
|
logger.info("pipeline_run_stage: %s (dry_run=%s)", stage, dry_run) |
||||||
|
return result |
||||||
|
|
||||||
|
|
||||||
|
def pipeline_run_full( |
||||||
|
db_path: str, |
||||||
|
dry_run: bool = False, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Run all pipeline stages in dependency order. |
||||||
|
|
||||||
|
Args: |
||||||
|
db_path: Path to DuckDB database |
||||||
|
dry_run: If True, return planned actions without executing |
||||||
|
|
||||||
|
Returns: |
||||||
|
dict with stage statuses |
||||||
|
""" |
||||||
|
stages = ["ingestion", "votes", "svd", "text_embeddings", "fusion", "similarity"] |
||||||
|
results = [] |
||||||
|
|
||||||
|
for stage in stages: |
||||||
|
result = pipeline_run_stage(db_path, stage, dry_run=dry_run) |
||||||
|
results.append(result) |
||||||
|
|
||||||
|
return { |
||||||
|
"stages": results, |
||||||
|
"dry_run": dry_run, |
||||||
|
"status": "planned" if dry_run else "partial", |
||||||
|
} |
||||||
|
|
||||||
|
|
||||||
|
def pipeline_check_health(db_path: str) -> Dict[str, Any]: |
||||||
|
"""Check pipeline health and return structured report. |
||||||
|
|
||||||
|
Reuses the health/ module and database queries. |
||||||
|
""" |
||||||
|
try: |
||||||
|
from health.checks import check_motion_freshness, check_embedding_coverage |
||||||
|
|
||||||
|
checks = [] |
||||||
|
healthy = True |
||||||
|
|
||||||
|
try: |
||||||
|
freshness = check_motion_freshness(db_path) |
||||||
|
checks.append({ |
||||||
|
"name": "motion_freshness", |
||||||
|
"healthy": freshness.get("healthy", False), |
||||||
|
"details": freshness, |
||||||
|
}) |
||||||
|
if not freshness.get("healthy", False): |
||||||
|
healthy = False |
||||||
|
except Exception as e: |
||||||
|
checks.append({"name": "motion_freshness", "healthy": False, "error": str(e)}) |
||||||
|
healthy = False |
||||||
|
|
||||||
|
try: |
||||||
|
embedding = check_embedding_coverage(db_path) |
||||||
|
checks.append({ |
||||||
|
"name": "embedding_coverage", |
||||||
|
"healthy": embedding.get("healthy", False), |
||||||
|
"details": embedding, |
||||||
|
}) |
||||||
|
if not embedding.get("healthy", False): |
||||||
|
healthy = False |
||||||
|
except Exception as e: |
||||||
|
checks.append({"name": "embedding_coverage", "healthy": False, "error": str(e)}) |
||||||
|
healthy = False |
||||||
|
|
||||||
|
status = query_pipeline_status(db_path) |
||||||
|
|
||||||
|
return { |
||||||
|
"healthy": healthy, |
||||||
|
"checks": checks, |
||||||
|
"pipeline_status": status, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("pipeline_check_health failed") |
||||||
|
return { |
||||||
|
"healthy": False, |
||||||
|
"checks": [], |
||||||
|
"error": str(e), |
||||||
|
} |
||||||
|
|
||||||
|
|
||||||
|
def pipeline_get_logs( |
||||||
|
db_path: str, |
||||||
|
stage: Optional[str] = None, |
||||||
|
lines: int = 50, |
||||||
|
) -> List[str]: |
||||||
|
"""Return recent log lines for a stage. |
||||||
|
|
||||||
|
Note: This is a placeholder. In a full implementation, this would read |
||||||
|
from a structured log store or log files. |
||||||
|
""" |
||||||
|
# Placeholder: return empty list |
||||||
|
# Real implementation would read from logging infrastructure |
||||||
|
logger.info("pipeline_get_logs requested for stage=%s lines=%d", stage, lines) |
||||||
|
return [] |
||||||
|
|
||||||
|
|
||||||
|
def pipeline_validate_output( |
||||||
|
db_path: str, |
||||||
|
stage: str, |
||||||
|
) -> Dict[str, Any]: |
||||||
|
"""Validate that a stage's output looks reasonable. |
||||||
|
|
||||||
|
Args: |
||||||
|
db_path: Path to DuckDB database |
||||||
|
stage: Pipeline stage to validate |
||||||
|
|
||||||
|
Returns: |
||||||
|
dict with validation results |
||||||
|
""" |
||||||
|
if stage not in VALID_STAGES: |
||||||
|
return { |
||||||
|
"valid": False, |
||||||
|
"error": f"Invalid stage '{stage}'", |
||||||
|
} |
||||||
|
|
||||||
|
try: |
||||||
|
status = query_pipeline_status(db_path) |
||||||
|
|
||||||
|
validators = { |
||||||
|
"svd": lambda s: s.get("svd_window_count", 0) > 0, |
||||||
|
"similarity": lambda s: s.get("embedding_count", 0) > 0, |
||||||
|
"ingestion": lambda s: s.get("motion_count", 0) > 0, |
||||||
|
"votes": lambda s: s.get("motion_count", 0) > 0, |
||||||
|
"text_embeddings": lambda s: s.get("embedding_count", 0) > 0, |
||||||
|
"fusion": lambda s: s.get("embedding_count", 0) > 0, |
||||||
|
} |
||||||
|
|
||||||
|
is_valid = validators.get(stage, lambda s: False)(status) |
||||||
|
|
||||||
|
return { |
||||||
|
"valid": is_valid, |
||||||
|
"stage": stage, |
||||||
|
"pipeline_status": status, |
||||||
|
} |
||||||
|
except Exception as e: |
||||||
|
logger.exception("pipeline_validate_output failed") |
||||||
|
return {"valid": False, "stage": stage, "error": str(e)} |
||||||
@ -0,0 +1,233 @@ |
|||||||
|
--- |
||||||
|
title: Agent-Native Architecture Plan for Stemwijzer |
||||||
|
type: refactor |
||||||
|
status: active |
||||||
|
date: 2026-05-01 |
||||||
|
origin: STRATEGY.md (agent-native architecture track) |
||||||
|
--- |
||||||
|
|
||||||
|
# Agent-Native Architecture Plan for Stemwijzer |
||||||
|
|
||||||
|
## Overview |
||||||
|
|
||||||
|
Stemwijzer is a data-heavy analytical application with three surfaces: a Streamlit voting UI, a data pipeline (OData ingestion → DuckDB → SVD/embedding computation), and an analytics explorer. The agent-native architecture track aims to make every operation an agent can perform as capable as a human operator—whether that's running the pipeline, diagnosing drift, or answering research questions about parliamentary voting patterns. |
||||||
|
|
||||||
|
**Current state:** The codebase is human-operated. Scripts are run manually, pipeline status is checked by eye, and analysis requires writing Python/DuckDB queries. |
||||||
|
|
||||||
|
**Target state:** An agent with access to atomic primitives can run the pipeline, diagnose issues, generate reports, and answer open-ended questions about the data—operating in a loop until outcomes are achieved. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Problem Frame |
||||||
|
|
||||||
|
- **Pipeline operators** need to know when data is stale, why SVD vectors look wrong, or whether the similarity cache is healthy. Currently this requires manually running scripts and interpreting output. |
||||||
|
- **Analysts/researchers** want to ask questions like "Which parties shifted most on economic axes between 2020 and 2024?" Currently this requires writing DuckDB queries and Python analysis code. |
||||||
|
- **Developers** need to understand pipeline state, verify data integrity, and troubleshoot ingestion issues. Currently this requires reading logs and running diagnostics manually. |
||||||
|
- **Content maintainers** need to verify SVD labels match actual voting patterns, check motion coverage, and validate layman explanations. Currently ad-hoc. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Requirements Trace |
||||||
|
|
||||||
|
- R1. The agent can achieve anything a pipeline operator can achieve (parity) |
||||||
|
- R2. The agent can answer open-ended analytical questions about parliamentary data (emergent capability) |
||||||
|
- R3. The agent can diagnose pipeline health and suggest remediation (self-service operations) |
||||||
|
- R4. The agent can generate and validate content (SVD labels, motion summaries) |
||||||
|
- R5. New capabilities can be added by writing prompts, not code (composability) |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Scope Boundaries |
||||||
|
|
||||||
|
- **In scope:** Agent primitives for data operations, pipeline control, analysis, and diagnostics |
||||||
|
- **Deferred:** Real-time agent UI inside Streamlit (future phase—add chat interface to explorer) |
||||||
|
- **Deferred:** Autonomous pipeline scheduling (scheduler.py exists but agent control is v2) |
||||||
|
- **Not working on:** Natural language to SQL for end users (this plan targets agent operators, not voter-facing features) |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Key Technical Decisions |
||||||
|
|
||||||
|
- **Files as universal interface:** DuckDB is already file-based (`data/motions.db`). The agent's workspace is the repo itself. Logs, reports, and analysis outputs are files the agent writes and the human reads. |
||||||
|
- **Database tools over file tools for structured data:** For querying motions, votes, and embeddings, the agent needs `query_database` primitives that wrap DuckDB/SQL, not raw file operations. |
||||||
|
- **Pipeline as state machine:** The pipeline has discrete stages (ingestion → vote extraction → SVD → text embeddings → fusion → similarity). The agent needs stage-aware tools, not just "run everything." |
||||||
|
- **Shared workspace:** Agent and human operate on the same `data/motions.db`, the same `thoughts/explorer/` outputs, the same `docs/solutions/` knowledge base. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Implementation Units |
||||||
|
|
||||||
|
- [ ] U1. **Database query primitives** |
||||||
|
- **Goal:** Give the agent structured access to the DuckDB database |
||||||
|
- **Requirements:** R1, R2, R4 |
||||||
|
- **Dependencies:** None |
||||||
|
- **Files:** |
||||||
|
- Create: `agent_tools/database.py` |
||||||
|
- Test: `tests/agent_tools/test_database_tools.py` |
||||||
|
- **Approach:** Wrap DuckDB queries as atomic tools: |
||||||
|
- `query_motions(filter, limit, order)` → returns motion rows as JSON |
||||||
|
- `query_votes(motion_id, party)` → returns vote counts |
||||||
|
- `query_svd_vectors(window_id, entity_type)` → returns vectors |
||||||
|
- `query_party_positions(window_id)` → returns party axis scores |
||||||
|
- `query_pipeline_status()` → returns freshness metrics from health checks |
||||||
|
- **Patterns to follow:** `health/checks.py` already has DB query patterns; `analysis/explorer_data.py` has read-only query patterns |
||||||
|
- **Test scenarios:** |
||||||
|
- Happy path: query returns valid JSON for known filters |
||||||
|
- Edge case: empty result set returns `[]` not error |
||||||
|
- Error path: invalid SQL/filter returns structured error with suggestion |
||||||
|
- **Verification:** Agent can answer "How many motions in 2024?" using only the tool |
||||||
|
|
||||||
|
- [ ] U2. **Pipeline control primitives** |
||||||
|
- **Goal:** Let the agent run, monitor, and diagnose pipeline stages |
||||||
|
- **Requirements:** R1, R3 |
||||||
|
- **Dependencies:** U1 |
||||||
|
- **Files:** |
||||||
|
- Create: `agent_tools/pipeline.py` |
||||||
|
- Test: `tests/agent_tools/test_pipeline_tools.py` |
||||||
|
- **Approach:** Stage-aware pipeline tools: |
||||||
|
- `pipeline_run_stage(stage, window_id, dry_run)` → runs one stage, returns status |
||||||
|
- `pipeline_run_full(dry_run)` → orchestrates all stages with dependency ordering |
||||||
|
- `pipeline_check_health()` → returns health report (reuses `health/` module) |
||||||
|
- `pipeline_get_logs(stage, lines)` → returns recent logs for a stage |
||||||
|
- `pipeline_validate_output(stage)` → checks output exists and looks reasonable |
||||||
|
- **Patterns to follow:** `pipeline/run_pipeline.py` has the stage orchestration; `scripts/health_check.py` has the CLI pattern |
||||||
|
- **Test scenarios:** |
||||||
|
- Happy path: dry-run returns planned actions without executing |
||||||
|
- Integration: running `pipeline_run_stage("svd", "2024")` produces expected `svd_vectors` rows |
||||||
|
- Error path: running a stage with missing dependencies returns clear error |
||||||
|
- **Verification:** Agent can diagnose "Why are SVD vectors stale?" by checking health, reading logs, and suggesting which stage to re-run |
||||||
|
|
||||||
|
- [ ] U3. **Analysis and report generation primitives** |
||||||
|
- **Goal:** Let the agent perform analytical tasks and write reports |
||||||
|
- **Requirements:** R2, R4 |
||||||
|
- **Dependencies:** U1 |
||||||
|
- **Files:** |
||||||
|
- Create: `agent_tools/analysis.py` |
||||||
|
- Create: `agent_tools/reports.py` |
||||||
|
- Test: `tests/agent_tools/test_analysis_tools.py` |
||||||
|
- **Approach:** |
||||||
|
- `analyze_party_shift(party, window_start, window_end, metric)` → computes and returns shift data |
||||||
|
- `analyze_axis_stability(component, windows)` → returns stability scores |
||||||
|
- `generate_report(type, parameters, output_path)` → writes markdown report to `reports/` |
||||||
|
- `validate_svd_labels(component)` → compares theme labels to actual party positions |
||||||
|
- **Patterns to follow:** `analysis/political_axis.py`, `scripts/motion_drift.py`, `scripts/validate_svd_themes.py` |
||||||
|
- **Test scenarios:** |
||||||
|
- Happy path: `analyze_party_shift` returns structured data for known party |
||||||
|
- Integration: `generate_report("drift", {windows: ["2020", "2024"]})` produces valid markdown |
||||||
|
- Edge case: requesting analysis for nonexistent window returns empty result |
||||||
|
- **Verification:** Agent can answer "Which parties shifted most on economic axes?" by running analysis and summarizing results |
||||||
|
|
||||||
|
- [ ] U4. **Content validation primitives** |
||||||
|
- **Goal:** Let the agent validate and suggest content improvements |
||||||
|
- **Requirements:** R4 |
||||||
|
- **Dependencies:** U1, U3 |
||||||
|
- **Files:** |
||||||
|
- Create: `agent_tools/content.py` |
||||||
|
- Test: `tests/agent_tools/test_content_tools.py` |
||||||
|
- **Approach:** |
||||||
|
- `validate_motion_coverage(start_date, end_date)` → returns coverage gaps |
||||||
|
- `validate_layman_explanations(sample_size)` → samples motions, checks explanation quality |
||||||
|
- `suggest_svd_label(component, top_n_motions)` → analyzes top motions, suggests label |
||||||
|
- `check_embedding_quality(window_id)` → returns coverage stats for fused embeddings |
||||||
|
- **Patterns to follow:** `summarizer.py` for explanation logic; `scripts/validate_svd_themes.py` for theme validation |
||||||
|
- **Test scenarios:** |
||||||
|
- Happy path: `validate_motion_coverage` returns accurate gap list |
||||||
|
- Edge case: all motions covered returns empty gaps |
||||||
|
- **Verification:** Agent can run weekly content quality checks and produce a report |
||||||
|
|
||||||
|
- [ ] U5. **System prompt and context injection** |
||||||
|
- **Goal:** Define agent behavior and inject runtime context |
||||||
|
- **Requirements:** R1, R2, R3, R4, R5 |
||||||
|
- **Dependencies:** U1-U4 |
||||||
|
- **Files:** |
||||||
|
- Create: `agent_tools/SYSTEM_PROMPT.md` |
||||||
|
- Create: `agent_tools/context.py` |
||||||
|
- **Approach:** |
||||||
|
- `SYSTEM_PROMPT.md`: Defines agent identity ("You are the Stemwijzer pipeline operator"), available tools, decision criteria, and output conventions |
||||||
|
- `context.py`: Injects runtime context—current pipeline status, latest SVD window, known issues from `docs/solutions/`, active party list |
||||||
|
- `context.md` pattern: Agent maintains `agent_tools/context.md` with accumulated learnings about the pipeline |
||||||
|
- **Patterns to follow:** `ce-agent-native-architecture` context.md pattern; `AGENTS.md` for project conventions |
||||||
|
- **Test scenarios:** |
||||||
|
- Context injection produces valid markdown with current DB stats |
||||||
|
- System prompt loads and parses without errors |
||||||
|
- **Verification:** Agent session starts with full context of pipeline state |
||||||
|
|
||||||
|
- [ ] U6. **Agent-native testing and parity verification** |
||||||
|
- **Goal:** Ensure agent can do everything humans can do |
||||||
|
- **Requirements:** R1 |
||||||
|
- **Dependencies:** U1-U5 |
||||||
|
- **Files:** |
||||||
|
- Create: `tests/agent_tools/test_parity.py` |
||||||
|
- Modify: `tests/conftest.py` (add agent tool fixtures) |
||||||
|
- **Approach:** |
||||||
|
- Parity tests: For each human action (run pipeline, check health, generate report), verify the agent tool achieves the same outcome |
||||||
|
- Integration tests: Agent runs a full diagnostic loop (check health → identify issue → run fix → verify) |
||||||
|
- `test_parity.py`: Matrix of human action → agent tool → expected outcome |
||||||
|
- **Test scenarios:** |
||||||
|
- Parity: "Human runs health check CLI" vs "Agent calls pipeline_check_health()" → same result |
||||||
|
- Integration: Agent detects stale data, runs pipeline, verifies freshness |
||||||
|
- **Verification:** All parity tests pass |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Output Structure |
||||||
|
|
||||||
|
``` |
||||||
|
agent_tools/ # New directory |
||||||
|
├── __init__.py |
||||||
|
├── SYSTEM_PROMPT.md # Agent behavior definition |
||||||
|
├── context.py # Runtime context injection |
||||||
|
├── context.md # Accumulated agent knowledge |
||||||
|
├── database.py # DB query primitives |
||||||
|
├── pipeline.py # Pipeline control primitives |
||||||
|
├── analysis.py # Analysis primitives |
||||||
|
├── reports.py # Report generation |
||||||
|
└── content.py # Content validation primitives |
||||||
|
|
||||||
|
tests/agent_tools/ # New test directory |
||||||
|
├── __init__.py |
||||||
|
├── test_database_tools.py |
||||||
|
├── test_pipeline_tools.py |
||||||
|
├── test_analysis_tools.py |
||||||
|
├── test_content_tools.py |
||||||
|
└── test_parity.py |
||||||
|
|
||||||
|
reports/ # Agent-generated reports (gitignored) |
||||||
|
``` |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## System-Wide Impact |
||||||
|
|
||||||
|
- **Interaction graph:** Agent tools call into `database.py`, `pipeline/`, `analysis/`, `health/` modules. These modules are already well-factored and read-only where appropriate. |
||||||
|
- **Error propagation:** Agent tools return structured errors (JSON with `error`, `suggestion`, `retryable` fields) rather than raising exceptions. This lets the agent reason about failures. |
||||||
|
- **State lifecycle:** Agent-generated reports in `reports/` are ephemeral (gitignored). Agent updates to `context.md` are durable and committed. |
||||||
|
- **Unchanged invariants:** The Streamlit UI, the data pipeline logic, and the SVD computation remain unchanged. Agent tools are a new surface, not a refactor. |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Risks & Dependencies |
||||||
|
|
||||||
|
| Risk | Mitigation | |
||||||
|
|------|-----------| |
||||||
|
| DuckDB concurrency (read-only agent + write pipeline) | Agent uses read-only connections; pipeline uses write connections. DuckDB handles this at the file level. | |
||||||
|
| Agent tools become stale as pipeline evolves | Tools are thin wrappers around stable module interfaces. U6 parity tests catch drift. | |
||||||
|
| Context injection grows too large | Context is scoped to the task. `context.py` generates minimal relevant context, not full DB dumps. | |
||||||
|
| Security: agent has DB access | Agent runs in the same trust boundary as the developer. No new security surface. | |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Documentation / Operational Notes |
||||||
|
|
||||||
|
- Add `agent_tools/` to `AGENTS.md` so future agents know the capability surface exists |
||||||
|
- Document the parity test matrix in `tests/agent_tools/README.md` |
||||||
|
- `reports/` should be gitignored; agent reports are ephemeral outputs |
||||||
|
|
||||||
|
--- |
||||||
|
|
||||||
|
## Sources & References |
||||||
|
|
||||||
|
- **Origin:** STRATEGY.md (agent-native architecture track) |
||||||
|
- **Skill:** `ce-agent-native-architecture` (parity, granularity, composability, emergent capability) |
||||||
|
- **Related code:** `health/`, `pipeline/`, `analysis/`, `database.py` |
||||||
|
- **Related docs:** `docs/plans/2026-04-24-ROADMAP-stemwijzer-improvements.md` (P4 tracks) |
||||||
@ -0,0 +1,74 @@ |
|||||||
|
"""Tests for agent analysis and report generation primitives.""" |
||||||
|
|
||||||
|
import pytest |
||||||
|
import os |
||||||
|
|
||||||
|
pytest.importorskip("duckdb") |
||||||
|
|
||||||
|
|
||||||
|
class TestAnalyzePartyShift: |
||||||
|
def test_returns_shift_data(self, tmp_duckdb_path): |
||||||
|
from agent_tools.analysis import analyze_party_shift |
||||||
|
|
||||||
|
result = analyze_party_shift( |
||||||
|
tmp_duckdb_path, party="VVD", window_start="2020", window_end="2024" |
||||||
|
) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "party" in result |
||||||
|
assert "shift" in result or "error" in result |
||||||
|
|
||||||
|
def test_nonexistent_party_returns_error(self, tmp_duckdb_path): |
||||||
|
from agent_tools.analysis import analyze_party_shift |
||||||
|
|
||||||
|
result = analyze_party_shift( |
||||||
|
tmp_duckdb_path, party="FAKE", window_start="2020", window_end="2024" |
||||||
|
) |
||||||
|
assert isinstance(result, dict) |
||||||
|
|
||||||
|
|
||||||
|
class TestAnalyzeAxisStability: |
||||||
|
def test_returns_stability_scores(self, tmp_duckdb_path): |
||||||
|
from agent_tools.analysis import analyze_axis_stability |
||||||
|
|
||||||
|
result = analyze_axis_stability(tmp_duckdb_path, component=1, windows=["2020", "2024"]) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "component" in result |
||||||
|
assert "stability" in result or "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestGenerateReport: |
||||||
|
def test_writes_markdown_file(self, tmp_duckdb_path, tmp_path): |
||||||
|
from agent_tools.reports import generate_report |
||||||
|
|
||||||
|
output_path = str(tmp_path / "report.md") |
||||||
|
result = generate_report( |
||||||
|
tmp_duckdb_path, |
||||||
|
report_type="summary", |
||||||
|
parameters={}, |
||||||
|
output_path=output_path, |
||||||
|
) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert os.path.exists(output_path) |
||||||
|
|
||||||
|
def test_returns_error_for_unknown_type(self, tmp_duckdb_path, tmp_path): |
||||||
|
from agent_tools.reports import generate_report |
||||||
|
|
||||||
|
output_path = str(tmp_path / "report.md") |
||||||
|
result = generate_report( |
||||||
|
tmp_duckdb_path, |
||||||
|
report_type="unknown", |
||||||
|
parameters={}, |
||||||
|
output_path=output_path, |
||||||
|
) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestValidateSvdLabels: |
||||||
|
def test_returns_validation_result(self, tmp_duckdb_path): |
||||||
|
from agent_tools.analysis import validate_svd_labels |
||||||
|
|
||||||
|
result = validate_svd_labels(tmp_duckdb_path, component=1) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "component" in result |
||||||
|
assert "valid" in result or "error" in result |
||||||
@ -0,0 +1,44 @@ |
|||||||
|
"""Tests for agent content validation primitives.""" |
||||||
|
|
||||||
|
import pytest |
||||||
|
|
||||||
|
pytest.importorskip("duckdb") |
||||||
|
|
||||||
|
|
||||||
|
class TestValidateMotionCoverage: |
||||||
|
def test_returns_coverage_gaps(self, tmp_duckdb_path): |
||||||
|
from agent_tools.content import validate_motion_coverage |
||||||
|
|
||||||
|
result = validate_motion_coverage(tmp_duckdb_path, start_date="2024-01-01", end_date="2024-12-31") |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "gaps" in result |
||||||
|
assert "coverage_rate" in result or "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestValidateLaymanExplanations: |
||||||
|
def test_returns_quality_report(self, tmp_duckdb_path): |
||||||
|
from agent_tools.content import validate_layman_explanations |
||||||
|
|
||||||
|
result = validate_layman_explanations(tmp_duckdb_path, sample_size=5) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "sample_size" in result |
||||||
|
assert "coverage" in result or "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestSuggestSvdLabel: |
||||||
|
def test_returns_suggestion(self, tmp_duckdb_path): |
||||||
|
from agent_tools.content import suggest_svd_label |
||||||
|
|
||||||
|
result = suggest_svd_label(tmp_duckdb_path, component=1, top_n=5) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "component" in result |
||||||
|
assert "suggestion" in result or "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestCheckEmbeddingQuality: |
||||||
|
def test_returns_coverage_stats(self, tmp_duckdb_path): |
||||||
|
from agent_tools.content import check_embedding_quality |
||||||
|
|
||||||
|
result = check_embedding_quality(tmp_duckdb_path, window_id="current_parliament") |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "coverage" in result or "error" in result |
||||||
@ -0,0 +1,75 @@ |
|||||||
|
"""Tests for agent database query primitives.""" |
||||||
|
|
||||||
|
import pytest |
||||||
|
import json |
||||||
|
|
||||||
|
pytest.importorskip("duckdb") |
||||||
|
|
||||||
|
|
||||||
|
class TestQueryMotions: |
||||||
|
def test_returns_motion_rows(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_motions |
||||||
|
|
||||||
|
result = query_motions(tmp_duckdb_path) |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
def test_respects_limit(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_motions |
||||||
|
|
||||||
|
result = query_motions(tmp_duckdb_path, limit=5) |
||||||
|
assert len(result) <= 5 |
||||||
|
|
||||||
|
def test_empty_db_returns_empty_list(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_motions |
||||||
|
|
||||||
|
result = query_motions(tmp_duckdb_path) |
||||||
|
assert result == [] |
||||||
|
|
||||||
|
|
||||||
|
class TestQueryVotes: |
||||||
|
def test_returns_vote_counts(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_votes |
||||||
|
|
||||||
|
result = query_votes(tmp_duckdb_path, motion_id=1) |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
def test_filters_by_party(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_votes |
||||||
|
|
||||||
|
result = query_votes(tmp_duckdb_path, motion_id=1, party="VVD") |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
|
||||||
|
class TestQuerySvdVectors: |
||||||
|
def test_returns_vectors(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_svd_vectors |
||||||
|
|
||||||
|
result = query_svd_vectors(tmp_duckdb_path, window_id="current_parliament") |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
def test_filters_by_entity_type(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_svd_vectors |
||||||
|
|
||||||
|
result = query_svd_vectors( |
||||||
|
tmp_duckdb_path, window_id="current_parliament", entity_type="mp" |
||||||
|
) |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
|
||||||
|
class TestQueryPartyPositions: |
||||||
|
def test_returns_party_scores(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_party_positions |
||||||
|
|
||||||
|
result = query_party_positions(tmp_duckdb_path, window_id="current_parliament") |
||||||
|
assert isinstance(result, list) |
||||||
|
|
||||||
|
|
||||||
|
class TestQueryPipelineStatus: |
||||||
|
def test_returns_status_dict(self, tmp_duckdb_path): |
||||||
|
from agent_tools.database import query_pipeline_status |
||||||
|
|
||||||
|
result = query_pipeline_status(tmp_duckdb_path) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "motion_count" in result |
||||||
|
assert "latest_motion_date" in result |
||||||
|
assert "svd_window_count" in result |
||||||
@ -0,0 +1,160 @@ |
|||||||
|
"""Parity tests: verify agent tools can achieve what humans can. |
||||||
|
|
||||||
|
These tests ensure the agent-native architecture satisfies the parity principle: |
||||||
|
"Whatever the user can do through the UI/scripts, the agent can achieve through tools." |
||||||
|
""" |
||||||
|
|
||||||
|
import os |
||||||
|
import pytest |
||||||
|
|
||||||
|
pytest.importorskip("duckdb") |
||||||
|
|
||||||
|
|
||||||
|
class TestDatabaseParity: |
||||||
|
"""Agent database queries vs human SQL queries.""" |
||||||
|
|
||||||
|
def test_agent_query_motions_matches_raw_sql(self, tmp_duckdb_path): |
||||||
|
"""Human: SELECT * FROM motions LIMIT 10 |
||||||
|
Agent: query_motions(db_path, limit=10) |
||||||
|
""" |
||||||
|
import duckdb |
||||||
|
from agent_tools.database import query_motions |
||||||
|
|
||||||
|
# Human approach — handle empty DB gracefully |
||||||
|
con = duckdb.connect(tmp_duckdb_path) |
||||||
|
try: |
||||||
|
human_result = con.execute("SELECT * FROM motions LIMIT 10").fetchdf().to_dict("records") |
||||||
|
except Exception: |
||||||
|
human_result = [] |
||||||
|
con.close() |
||||||
|
|
||||||
|
# Agent approach |
||||||
|
agent_result = query_motions(tmp_duckdb_path, limit=10) |
||||||
|
|
||||||
|
# Both should return lists |
||||||
|
assert isinstance(human_result, list) |
||||||
|
assert isinstance(agent_result, list) |
||||||
|
assert len(agent_result) == len(human_result) |
||||||
|
|
||||||
|
def test_agent_pipeline_status_matches_raw_query(self, tmp_duckdb_path): |
||||||
|
"""Human: SELECT COUNT(*) FROM motions |
||||||
|
Agent: query_pipeline_status(db_path) |
||||||
|
""" |
||||||
|
import duckdb |
||||||
|
from agent_tools.database import query_pipeline_status |
||||||
|
|
||||||
|
con = duckdb.connect(tmp_duckdb_path) |
||||||
|
try: |
||||||
|
human_count = con.execute("SELECT COUNT(*) FROM motions").fetchone()[0] |
||||||
|
except Exception: |
||||||
|
human_count = 0 |
||||||
|
con.close() |
||||||
|
|
||||||
|
agent_status = query_pipeline_status(tmp_duckdb_path) |
||||||
|
|
||||||
|
assert agent_status["motion_count"] == human_count |
||||||
|
|
||||||
|
|
||||||
|
class TestHealthCheckParity: |
||||||
|
"""Agent health check vs human script execution.""" |
||||||
|
|
||||||
|
def test_agent_health_check_matches_script(self, tmp_duckdb_path): |
||||||
|
"""Human: python scripts/health_check.py |
||||||
|
Agent: pipeline_check_health(db_path) |
||||||
|
""" |
||||||
|
from agent_tools.pipeline import pipeline_check_health |
||||||
|
|
||||||
|
# Agent approach |
||||||
|
agent_result = pipeline_check_health(tmp_duckdb_path) |
||||||
|
|
||||||
|
assert isinstance(agent_result, dict) |
||||||
|
assert "healthy" in agent_result |
||||||
|
assert "checks" in agent_result |
||||||
|
|
||||||
|
|
||||||
|
class TestReportGenerationParity: |
||||||
|
"""Agent report generation vs human manual analysis.""" |
||||||
|
|
||||||
|
def test_agent_generates_summary_report(self, tmp_duckdb_path, tmp_path): |
||||||
|
"""Human: Write a summary of pipeline state |
||||||
|
Agent: generate_report(db_path, "summary", ...) |
||||||
|
""" |
||||||
|
from agent_tools.reports import generate_report |
||||||
|
|
||||||
|
output_path = str(tmp_path / "summary.md") |
||||||
|
result = generate_report( |
||||||
|
tmp_duckdb_path, |
||||||
|
report_type="summary", |
||||||
|
parameters={}, |
||||||
|
output_path=output_path, |
||||||
|
) |
||||||
|
|
||||||
|
assert result["status"] == "written" |
||||||
|
assert os.path.exists(output_path) |
||||||
|
|
||||||
|
# Should contain key sections |
||||||
|
content = open(output_path).read() |
||||||
|
assert "Pipeline Summary" in content |
||||||
|
assert "Motions in database" in content |
||||||
|
|
||||||
|
|
||||||
|
class TestAnalysisParity: |
||||||
|
"""Agent analysis vs human analytical queries.""" |
||||||
|
|
||||||
|
def test_agent_party_shift_analysis(self, tmp_duckdb_path): |
||||||
|
"""Human: Write SQL to compare party positions across windows |
||||||
|
Agent: analyze_party_shift(db_path, ...) |
||||||
|
""" |
||||||
|
from agent_tools.analysis import analyze_party_shift |
||||||
|
|
||||||
|
result = analyze_party_shift( |
||||||
|
tmp_duckdb_path, |
||||||
|
party="VVD", |
||||||
|
window_start="2020", |
||||||
|
window_end="2024", |
||||||
|
) |
||||||
|
|
||||||
|
# Should return structured result (or error if no data) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "party" in result |
||||||
|
# Either shift data or error (empty DB is fine) |
||||||
|
assert "shift" in result or "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestIntegrationAgentDiagnosticLoop: |
||||||
|
"""Integration: Agent performs full diagnostic loop.""" |
||||||
|
|
||||||
|
def test_agent_diagnoses_stale_data(self, tmp_duckdb_path): |
||||||
|
"""Agent loop: |
||||||
|
1. Check health |
||||||
|
2. Query pipeline status |
||||||
|
3. Identify issue (empty DB = no data) |
||||||
|
4. Suggest remediation |
||||||
|
""" |
||||||
|
from agent_tools.pipeline import pipeline_check_health |
||||||
|
from agent_tools.database import query_pipeline_status |
||||||
|
|
||||||
|
# Step 1: Check health |
||||||
|
health = pipeline_check_health(tmp_duckdb_path) |
||||||
|
|
||||||
|
# Step 2: Query status |
||||||
|
status = query_pipeline_status(tmp_duckdb_path) |
||||||
|
|
||||||
|
# Step 3: Agent reasoning (simulated) |
||||||
|
issues = [] |
||||||
|
if status["motion_count"] == 0: |
||||||
|
issues.append("No motions in database") |
||||||
|
if status["svd_window_count"] == 0: |
||||||
|
issues.append("No SVD windows computed") |
||||||
|
|
||||||
|
# Step 4: Suggest remediation |
||||||
|
suggestions = [] |
||||||
|
if "No motions in database" in issues: |
||||||
|
suggestions.append("Run pipeline ingestion stage") |
||||||
|
if "No SVD windows computed" in issues: |
||||||
|
suggestions.append("Run SVD computation after ingestion") |
||||||
|
|
||||||
|
assert isinstance(issues, list) |
||||||
|
assert isinstance(suggestions, list) |
||||||
|
# Empty DB should produce actionable suggestions |
||||||
|
assert len(suggestions) > 0 |
||||||
@ -0,0 +1,59 @@ |
|||||||
|
"""Tests for agent pipeline control primitives.""" |
||||||
|
|
||||||
|
import pytest |
||||||
|
|
||||||
|
pytest.importorskip("duckdb") |
||||||
|
|
||||||
|
|
||||||
|
class TestPipelineRunStage: |
||||||
|
def test_dry_run_returns_planned_actions(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_run_stage |
||||||
|
|
||||||
|
result = pipeline_run_stage(tmp_duckdb_path, stage="svd", window_id="2024", dry_run=True) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "stage" in result |
||||||
|
assert result.get("dry_run") is True |
||||||
|
|
||||||
|
def test_invalid_stage_returns_error(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_run_stage |
||||||
|
|
||||||
|
result = pipeline_run_stage(tmp_duckdb_path, stage="invalid") |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "error" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestPipelineRunFull: |
||||||
|
def test_dry_run_returns_plan(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_run_full |
||||||
|
|
||||||
|
result = pipeline_run_full(tmp_duckdb_path, dry_run=True) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "stages" in result or "dry_run" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestPipelineCheckHealth: |
||||||
|
def test_returns_health_report(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_check_health |
||||||
|
|
||||||
|
result = pipeline_check_health(tmp_duckdb_path) |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "checks" in result |
||||||
|
assert "healthy" in result |
||||||
|
|
||||||
|
|
||||||
|
class TestPipelineGetLogs: |
||||||
|
def test_returns_log_lines(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_get_logs |
||||||
|
|
||||||
|
result = pipeline_get_logs(tmp_duckdb_path, stage="svd", lines=10) |
||||||
|
assert isinstance(result, list) |
||||||
|
assert len(result) <= 10 |
||||||
|
|
||||||
|
|
||||||
|
class TestPipelineValidateOutput: |
||||||
|
def test_validates_stage_output(self, tmp_duckdb_path): |
||||||
|
from agent_tools.pipeline import pipeline_validate_output |
||||||
|
|
||||||
|
result = pipeline_validate_output(tmp_duckdb_path, stage="svd") |
||||||
|
assert isinstance(result, dict) |
||||||
|
assert "valid" in result |
||||||
Loading…
Reference in new issue