chore: convert mindmodel from YAML to markdown and clean up

Delete 17 malformed YAML constraint files and 10 stale numbered constraint files. Convert domain glossary, patterns, stack, and anti-patterns to markdown format. Update manifest.yaml to reference new markdown files.
3 weeks ago · 88595c869b
parent 910ef0dc3b
commit 88595c869b
35 changed files with 1110 additions and 1472 deletions
--- a/.mindmodel/anti-patterns/anti-patterns.md
+++ b/.mindmodel/anti-patterns/anti-patterns.md
@ -0,0 +1,127 @@
+---
+title: Anti-Patterns in Stemwijzer
+category: anti-patterns
+severity: critical
+---
+
+# Anti-Patterns
+
+> **NOTE**: Some anti-patterns below were investigated and found to be resolved or invalid. See individual entries for details.
+
+## CRITICAL: print() Instead of Logging
+
+**File**: `api_client.py`
+**Evidence**: 11 instances of `print(f"...")` instead of `_logger.info(...)`
+
+**Broken code**:
+```python
+def get_motions(self, ...):
+    try:
+        # ...
+        print(f"Fetched {len(voting_records)} voting records from API")  # BAD
+        print(f"Processed into {len(motions)} unique motions")  # BAD
+    except Exception as e:
+        print(f"Error fetching motions from API: {e}")  # BAD - no traceback
+```
+
+**Fix**:
+```python
+import logging
+
+_logger = logging.getLogger(__name__)
+
+def get_motions(self, ...):
+    try:
+        _logger.info("Fetched %d voting records from API", len(voting_records))
+        _logger.info("Processed into %d unique motions", len(motions))
+    except Exception as e:
+        _logger.exception("Error fetching motions from API: %s", e)
+        return []
+```
+
+---
+
+## CRITICAL: Global `_DummySt` Replacement
+
+**File**: `explorer.py`
+**Evidence**: Lines ~50-70, module-level `st = _DummySt()` global replacement
+
+**Problem**: Creates a module-level variable `st` that shadows `streamlit` module, causing subtle bugs.
+
+**Fix**: Use conditional flags instead of global replacement:
+```python
+# GOOD: Use conditional logic
+try:
+    import plotly.express as px
+    import plotly.graph_objects as go
+    HAS_PLOTLY = True
+except ImportError:
+    HAS_PLOTLY = False
+    px = None
+    go = None
+
+def render_chart(data):
+    if not HAS_PLOTLY:
+        _logger.warning("Plotly not available")
+        return
+    # ... rest of chart logic
+```
+
+---
+
+## WARNING: Logger Naming Inconsistency
+
+**Evidence**: 16 files use `logger`, 17 files use `_logger`
+
+**Files with `logger`** (without underscore):
+- api_client.py, ai_provider.py, pipeline files, analysis files
+
+**Files with `_logger`** (with underscore):
+- database.py, explorer.py, explorer_helpers.py
+
+**Recommendation**: Standardize on `_logger` for module-level loggers.
+
+---
+
+## WARNING: Bare except with pass
+
+**File**: `database.py`, line 47
+
+```python
+# BAD - catches KeyboardInterrupt, SystemExit, MemoryError
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except:  # bare except
+    pass
+```
+
+**Fix**:
+```python
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except Exception as exc:
+    _logger.debug("Sequence creation skipped: %s", exc)
+```
+
+---
+
+## INVESTIGATED: Entity-ID / Party-Name Mismatch
+
+**Status**: INVALID - investigated and resolved
+
+**Investigation Summary**: `svd_vectors.entity_id` only contains MP names (not party names). Party centroids are correctly computed via `mp_metadata` lookups. No production bug exists.
+
+---
+
+## Pattern: Three Separate Party Alias Dictionaries
+
+**Problem**: Party name variations exist in 3+ places with no canonical alias mapping.
+
+**Fix**: Create one `PARTY_ALIASES` dict in `config.py`:
+```python
+PARTY_ALIASES = {
+    "GroenLinks-PvdA": ["GL-PvdA", "GroenLinks PvdA", "PvdA-GroenLinks"],
+    "PVV": ["Partij voor de Vrijheid"],
+    # ...
+}
+```
--- a/.mindmodel/anti-patterns/anti-patterns.yaml
+++ b/.mindmodel/anti-patterns/anti-patterns.yaml
@ -1,146 +0,0 @@
-# Anti-Patterns
-
-> ⚠️ **NOTE**: Section 1 below was **investigated and resolved** — it is NOT a bug (see §1 for details).
-
---
-
-## 1. ~~CRITICAL: Entity-ID / Party-Name Mismatch in `compute_party_coords`~~ → **INVALID — INVESTIGATED & RESOLVED**
-
-**Investigation Date**: 2026-03-31
-
-**Investigation Summary**: After thorough analysis of the database schema and code, this anti-pattern is **INVALID**. The original concern was based on a false assumption about `svd_vectors.entity_id` containing party names.
-
-**Investigation Findings**:
-1. **`svd_vectors` table has NO rows with `entity_type='party'`** — only `mp` and `motion` entity types exist in practice.
-2. **`entity_ids in svd_vectors are always MP names** (e.g., `"Van Dijk, I."`), never party names. The party centroids are correctly computed via `mp_metadata` lookups.
-3. **The trajectories plot WORKS correctly** — no production bug exists. The code path for party-level visualization does not rely on `svd_vectors.entity_id` containing party names.
-
-**Conclusion**: The original anti-pattern was a false positive caused by incorrect assumptions about data contents. The `party_map` reverse-lookup (`mp_name → party_name`) works correctly because `entity_id` values are always MP names, not party names.
-
---
-
-## 2. Bare `except: pass`
-
-**File**: `database.py`, line 47
-
-**Problem**: Catches **all** exceptions including `KeyboardInterrupt`, `SystemExit`, `MemoryError`.
-Silently swallows errors — no logging, no fallback.
-
-**Broken code**:
-```python
-try:
-    self.conn.execute(sql)
-except:  # ← bare except
-    pass
-```
-
-**Fix**:
-```python
-try:
-    self.conn.execute(sql)
-except ibis.errors.IbisError as e:
-    st.warning(f"Query failed: {e}")
-    raise  # or return a default
-```
-
---
-
-## 3. Nested Exception Handling
-
-**File**: `explorer.py`, lines 244–261
-
-**Problem**: Try/except inside try/except creates opaque error paths. Inner exception silently swallows outer intent.
-
-**Broken code**:
-```python
-try:
-    result = compute_svd(motions)
-    # ...
-except Exception:
-    try:
-        # Try fallback approach
-        result = fallback_compute(motions)
-    except Exception:
-        pass  # ← both exceptions silently dropped
-```
-
-**Fix**: Flatten — handle each case explicitly, or use a decorator.
-
---
-
-## 4. Catch-All `Exception` Used Everywhere
-
-**Problem**: `except Exception:` catches 50+ exception types including `ValueError`, `TypeError`, `KeyError`.
-Overly broad — masks real bugs.
-
-**Occurrence**: 850+ instances of bare/generic exception handlers across codebase.
-
-**Fix**: Catch specific exceptions. If you must catch multiple, chain them:
-```python
-except (KeyError, ValueError) as e:
-    logger.warning(f"Missing field: {e}")
-```
-
---
-
-## 5. No `entity_id` Format Validation
-
-**Problem**: `svd_vectors.entity_id` can be either:
- An MP name (e.g., `"Van Dijk, I."`) for individual-level SVD
- A party name (e.g., `"GroenLinks-PvdA"`) for party-level SVD
-
-No validation distinguishes which is which. Code must infer from context. (Note: In practice `svd_vectors.entity_id` only contains MP names — see §1 for investigation findings.)
-
-**Fix**: Add explicit format marker or separate columns:
-```python
-# Option A: separate columns
-svd_vectors = pd.DataFrame({
-    'mp_name': [...],       # nullable
-    'party_name': [...],     # nullable
-    'window': [...],
-    'vector_2d': [...]
-})
-
-# Option B: format prefix
-# "mp:Van Dijk, I." or "party:GroenLinks-PvdA"
-```
-
---
-
-## 6. Silent Fallback When Party Centroids Fail
-
-**Problem**: If `party_map` lookup fails (entity is a party, not MP), the code silently produces
-`party_map_count: 0` and empty `parties_with_centroid_counts`. No warning is raised.
-
-**Fix**: Add validation and warning:
-```python
-if party_map_count == 0:
-    st.warning(f"No party mappings found for {len(svd_df)} entities in window '{window}'")
-```
-
---
-
-## 7. Three Separate Party Alias Dictionaries (No Single Source of Truth)
-
-**Problem**: Party name variations exist in 3+ places:
- `PARTY_COLOURS` keys
- `party_map` values (from `mp_party_history`)
- Raw data column values
-
-No canonical alias mapping. Spelling mismatches cause silent failures.
-
-**Fix**: Create one `PARTY_ALIASES` dict in `config.py`:
-```python
-PARTY_ALIASES = {
-    "GroenLinks-PvdA": ["GL-PvdA", "GroenLinks PvdA", "PvdA-GroenLinks"],
-    "PVV": ["Partij voor de Vrijheid"],
-    ...
-}
-
-def resolve_party(name: str) -> str:
-    """Normalize any party name variant to canonical form."""
-    for canonical, aliases in PARTY_ALIASES.items():
-        if name in aliases or name == canonical:
-            return canonical
-    return name  # no alias found
-```
--- a/.mindmodel/constraints/01-naming.yaml
+++ b/.mindmodel/constraints/01-naming.yaml
@ -1,34 +0,0 @@
-# Naming & Style Conventions
-
-## Rules
- Modules and files: snake_case.py. Evidence: pipeline/run_pipeline.py, database.py, ai_provider.py
- Functions and methods: snake_case. Evidence: compute_svd_for_window (pipeline), _generate_windows (pipeline/run_pipeline.py)
- Classes: PascalCase. Evidence: MotionDatabase (database.py)
- Constants: UPPER_SNAKE_CASE. Evidence: VOTE_MAP, DATABASE_PATH (config inferred)
- Imports order: stdlib, third-party, local; prefer absolute imports and grouped.
- Use black, ruff, isort, mypy as the recommended toolchain; repository lacks config files (black, ruff, pyproject sections).
-
-## Examples
-
-### Function example (from pipeline/run_pipeline.py)
-```python
-def _generate_windows(start: date, end: date, granularity: str) -> List[Tuple[str, str, str]]:
-    """Return list of (window_id, start_str, end_str) tuples."""
-```
-
-### Class example (from database.py)
-```python
-class MotionDatabase:
-    def __init__(self, db_path: str = config.DATABASE_PATH):
-        ...
-```
-
-## Anti-patterns
- Missing formatting configs (black, ruff, isort). Add pyproject.toml sections or dedicated config files.
-
-## Remediations
- Add pyproject.toml tool sections for black/ruff/isort and a pre-commit config. Run ruff/black CI lint step.
-
-## Evidence pointers
- pipeline/run_pipeline.py: function _generate_windows (lines ~1-120)
- database.py: MotionDatabase class and methods (file database.py lines 1-400+)
--- a/.mindmodel/constraints/10-db-schema.yaml
+++ b/.mindmodel/constraints/10-db-schema.yaml
@ -1,74 +0,0 @@
-# Database Schema (DuckDB) — extracted DDL
-
-## Rules
- Use DuckDB for persistent storage when available; fallback to JSON files when duckdb is not installed (database.py). 
- Keep schema migrations additive (ALTER TABLE ADD COLUMN IF NOT EXISTS used in database.py).
-
-## Examples (DDL snippets extracted from database.py)
-
-### motions table
-```sql
-CREATE TABLE IF NOT EXISTS motions (
-    id INTEGER DEFAULT nextval('motions_id_seq'),
-    title TEXT NOT NULL,
-    description TEXT,
-    date DATE,
-    policy_area TEXT,
-    voting_results JSON,
-    winning_margin FLOAT,
-    controversy_score FLOAT,
-    layman_explanation TEXT,
-    externe_identifier TEXT,
-    body_text TEXT,
-    url TEXT UNIQUE,
-    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-    PRIMARY KEY (id)
-)
-```
-
-### mp_votes table
-```sql
-CREATE TABLE IF NOT EXISTS mp_votes (
-    id INTEGER DEFAULT nextval('mp_votes_id_seq'),
-    motion_id INTEGER NOT NULL,
-    mp_name TEXT NOT NULL,
-    party TEXT,
-    vote TEXT NOT NULL,
-    date DATE,
-    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-    PRIMARY KEY (id)
-)
-```
-
-### embeddings / fused_embeddings
-```sql
-CREATE TABLE IF NOT EXISTS embeddings (
-    id INTEGER DEFAULT nextval('embeddings_id_seq'),
-    motion_id INTEGER NOT NULL,
-    model TEXT,
-    vector JSON NOT NULL,
-    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-    PRIMARY KEY (id)
-)
-
-CREATE TABLE IF NOT EXISTS fused_embeddings (
-    id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
-    motion_id INTEGER NOT NULL,
-    window_id TEXT NOT NULL,
-    vector JSON NOT NULL,
-    svd_dims INTEGER NOT NULL,
-    text_dims INTEGER NOT NULL,
-    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-    PRIMARY KEY (id)
-)
-```
-
-## Anti-patterns
- Broad try/except around duckdb import (database.py top) — acceptable for optional dependency but should log explicitly the missing dependency and document test behavior.
-
-## Remediations
- Add a simple migration/versioning table (schema_version) to track schema changes and apply migrations deterministically.
- Add tests that exercise both duckdb-backed and JSON-fallback database paths. Evidence: database.py contains JSON fallback logic (lines ~1-80).
-
-## Evidence pointers
- database.py: DDL strings and sequences (file: database.py lines ~1-300 and further). See create table blocks for motions, mp_votes, embeddings, fused_embeddings.
--- a/.mindmodel/constraints/20-domain-glossary.yaml
+++ b/.mindmodel/constraints/20-domain-glossary.yaml
@ -1,22 +0,0 @@
-# Domain Glossary
-
-## Rules
- Use consistent domain terms across code and DB: Motion, MP, Party, embedding, window, svd_vector, fused_embedding, similarity_cache, session_id.
-
-## Terms
- Motion: parliamentary motion stored in `motions` table. Evidence: database.py CREATE TABLE motions (file: database.py lines ~40-110)
- MP (Member of Parliament): individual with votes stored in `mp_votes`. Evidence: database.py CREATE TABLE mp_votes
- Embedding: text embedding stored in `embeddings` table; fused vectors in `fused_embeddings`.
- SVD vector: reduced-dimensional vectors stored in `svd_vectors` table.
- Window: time window identifier (e.g., "2024-Q1") used across SVD/fusion pipelines. Evidence: pipeline/run_pipeline.py _generate_windows
- Controversy score: derived field stored on motions as controversy_score. Evidence: database.py insert_motion sets controversy_score
-
-## Examples / Usage
- pipeline.run_pipeline._generate_windows produces window ids used when storing svd_vectors and fused_embeddings. Evidence: pipeline/run_pipeline.py lines ~1-120
-
-## Evidence pointers
- database.py: motions, mp_votes, embeddings, fused_embeddings tables (file: database.py)
- pipeline/run_pipeline.py: window generation and pipeline phases (file: pipeline/run_pipeline.py)
-
-## Anti-patterns
- Inconsistent naming of domain terms across modules (e.g., `mp_vote_parties` vs `mp_votes` usage in database.insert_motion and pipeline extraction). Prefer canonical names matching DB columns and use small adapter functions when transitioning representations.
--- a/.mindmodel/constraints/30-clusters.yaml
+++ b/.mindmodel/constraints/30-clusters.yaml
@ -1,30 +0,0 @@
-# Code Clusters / Organization
-
-## Rules
- The repository organizes code into the following clusters (observed):
-  - UI / Streamlit: Home.py, pages/, app.py, explorer.py
-  - Database & persistence: database.py, config.py
-  - ETL / pipeline: pipeline/ (run_pipeline.py, svd_pipeline, text_pipeline, fusion)
-  - AI provider & summarization: ai_provider.py, pipeline/..., analysis/
-  - Similarity & caching: similarity/*, similarity_cache table in DB
-  - API client & scraping: api_client.py, pipeline/fetch_mp_metadata
-  - Analysis & visualization: analysis/visualize.py, explorer.py
-  - CLI & scheduler: scheduler.py, pipeline/run_pipeline.py
-  - Tests & migrations: tests/ (pytest) and database reset helpers
-
-## Examples
-
-### Pipeline orchestrator (cluster: CLI & pipeline)
-```python
-from database import MotionDatabase
-db = MotionDatabase(db_path)
-# then phases: fetch_mp_metadata, extract_mp_votes, compute svd, ensure_text_embeddings, fuse_for_window
-```
-
-## Remediations
- Add a brief CONTRIBUTING.md describing where to add new pipeline stages and how to run tests locally. Include notes about optional duckdb dependency and JSON fallback for tests.
-
-## Evidence pointers
- pipeline/run_pipeline.py: orchestrator and cluster boundaries (file: pipeline/run_pipeline.py)
- ai_provider.py: AI adapter for embeddings and chat (file: ai_provider.py)
- analysis/visualize.py: visualization cluster (file: analysis/visualize.py)
--- a/.mindmodel/constraints/40-patterns.yaml
+++ b/.mindmodel/constraints/40-patterns.yaml
@ -1,46 +0,0 @@
-# Design Patterns & Code Patterns
-
-## Rules
- Use repository-style DB wrapper: MotionDatabase encapsulates DuckDB access and schema management.
- AI provider adapter pattern: ai_provider.py exposes get_embedding(s) and chat_completion with retry/backoff and local fallback.
- Pipeline orchestration: run_pipeline.py uses phases, ThreadPoolExecutor for parallel SVD computation with careful DuckDB connection handling (collect results before writes).
-
-## Examples
-
-### Repository pattern (database.py MotionDatabase)
-```python
-class MotionDatabase:
-    def __init__(self, db_path: str = config.DATABASE_PATH):
-        self.db_path = db_path
-        self._init_database()
-
-    def insert_motion(self, motion_data: Dict) -> bool:
-        """Insert a new motion into database"""
-        # uses duckdb.connect and parameterized queries
-```
-
-### Provider adapter with retries (ai_provider.py)
-```python
-def _post_with_retries(path: str, json: dict[str, Any], retries: int = 3) -> requests.Response:
-    # Implements retries/backoff, handles 429 with Retry-After and 5xx responses
-```
-
-### Pipeline parallelism pattern (run_pipeline)
-```python
-with ThreadPoolExecutor(max_workers=max_workers) as pool:
-    for window_id, w_start, w_end in windows:
-        fut = pool.submit(compute_svd_for_window, db.db_path, window_id, w_start, w_end, args.svd_k)
-        futures[fut] = window_id
-# wait then write sequentially to DuckDB
-```
-
-## Anti-patterns
- Broad excepts used in several places (database.py top-level try/except on duckdb import, many generic excepts around DB operations) — can hide real errors.
-
-## Remediations
- Replace broad except Exception with targeted exceptions and explicit logging. Where fallback is intended (e.g., optional duckdb), log at INFO/DEBUG with clear message and include guidance in CONTRIBUTING.md.
-
-## Evidence pointers
- ai_provider.py: _post_with_retries, get_embedding(s), _local_embedding (file: ai_provider.py lines ~1-300)
- pipeline/run_pipeline.py: ThreadPoolExecutor usage and duckdb connection handling (file: pipeline/run_pipeline.py lines ~120-260)
- database.py: MotionDatabase methods (file: database.py)
--- a/.mindmodel/constraints/50-anti-patterns.yaml
+++ b/.mindmodel/constraints/50-anti-patterns.yaml
@ -1,24 +0,0 @@
-# Anti-patterns, Issues and Recommended Fixes
-
-## Rules
- Flagged issues discovered in Phase 1 must be remediated with concrete actions.
-
-## Issues
- pytest is listed as a runtime dependency (pyproject.toml). This increases image size and may pull dev-only transitive deps into production. Evidence: pyproject.toml
- openai is declared but static imports not found; may be unused. Evidence: pyproject.toml, ai_provider.py uses requests and env keys instead of openai imports.
- Many dependencies use permissive ">=" version ranges; no lockfile present. This reduces reproducibility.
- Missing formatting/linting configs (black, ruff, isort, mypy). Recommended to add config and CI steps.
- Broad except Exception used in many places (database.py, ai_provider.py fallback logic, analysis/visualize.py). This can mask bugs and slow debugging.
-
-## Remediations / Recommended fixes
- Move pytest from runtime dependencies to dev-dependencies in pyproject.toml.
-  - Suggested patch: under [project.optional-dependencies] or [tool.poetry.dev-dependencies] depending on toolchain.
- Audit `openai` usage. If unused, remove from pyproject.toml. If dynamically imported in runtime, add a small shim or explicit lazy import with documented env var.
- Pin critical dependencies or add upper bounds; generate lockfile (poetry.lock or pip-tools requirements.txt). Add CI job that fails on permissive ranges.
- Add black/ruff/isort/mypy config blocks to pyproject.toml and enable pre-commit hooks. Add CI lint stage.
- Replace broad except Exception with narrower catches and re-raise or log with traceback when unexpected. Example locations: database.py top import, insert_motion broad except, ai_provider fallback blocks.
-
-## Evidence pointers
- pyproject.toml: dependencies list (file: pyproject.toml lines 1-40)
- database.py: multiple broad except blocks (file: database.py top and methods)
- ai_provider.py: uses requests + env keys (file: ai_provider.py)
--- a/.mindmodel/constraints/60-examples.yaml
+++ b/.mindmodel/constraints/60-examples.yaml
@ -1,117 +0,0 @@
-# Example Extractions
-
-## Rules
- Include concrete examples extracted from the codebase: function signatures with docstrings, SQL DDL snippets, and pytest stubs following repository conventions.
-
-## (a) Function signatures with docstrings (5 examples)
-1) pipeline/run_pipeline.py::_generate_windows
-```python
-def _generate_windows(start: date, end: date, granularity: str) -> List[Tuple[str, str, str]]:
-    """Return list of (window_id, start_str, end_str) tuples.
-
-    window_id format:
-      quarterly → "2024-Q1", "2024-Q2", …
-      annual    → "2024"
-    """
-```
-
-2) database.py::append_audit_event
-```python
-def append_audit_event(
-    self,
-    actor_id: Optional[str],
-    action: str,
-    target_type: Optional[str] = None,
-    target_id: Optional[str] = None,
-    metadata: Optional[Dict] = None,
-) -> bool:
-    """Record an audit event. Tries DB then falls back to ledger file."""
-```
-
-3) ai_provider.py::get_embedding
-```python
-def get_embedding(text: str, model: str | None = None) -> list[float]:
-    """Return an embedding vector for `text` using the configured provider.
-
-    Raises ProviderError for configuration or provider-side failures.
-    """
-```
-
-4) ai_provider.py::get_embeddings_batch
-```python
-def get_embeddings_batch(
-    texts: list[str], model: str | None = None, batch_size: int = 50
-) -> list[list[float]]:
-    """Return embedding vectors for multiple texts using batched API calls."""
-```
-
-5) analysis/visualize.py::plot_umap_scatter
-```python
-def plot_umap_scatter(
-    motion_ids: List[int],
-    coords: List[List[float]],
-    labels: Optional[List[int]] = None,
-    window_id: Optional[str] = None,
-    output_path: str = "analysis_umap.html",
-) -> str:
-    """Produce a 2D scatter plot of UMAP-reduced fused embeddings."""
-```
-
-## (b) SQL / DDL snippets (3 examples inferred from database.py)
-1) motions table (see constraints/10-db-schema.yaml) — evidence: database.py CREATE TABLE motions (lines ~40-110)
-
-2) mp_votes table (see constraints/10-db-schema.yaml) — evidence: database.py CREATE TABLE mp_votes
-
-3) fused_embeddings table (see constraints/10-db-schema.yaml) — evidence: database.py CREATE TABLE fused_embeddings
-
-## (c) Pytest stubs (4 sample tests matching conventions)
-Create tests under tests/ named test_*.py using fixtures in conftest.py. Examples below are stubs to add.
-
-1) tests/test_database_basic.py
-```python
-def test_init_database_creates_tables(tmp_path):
-    db_path = str(tmp_path / "motions.db")
-    from database import MotionDatabase
-
-    db = MotionDatabase(db_path=db_path)
-    # If duckdb not available, JSON fallback should create .embeddings.json
-    assert db is not None
-```
-
-2) tests/test_ai_provider.py
-```python
-def test_local_embedding_fallback():
-    from ai_provider import _local_embedding
-
-    v = _local_embedding("hello world", dim=16)
-    assert isinstance(v, list) and len(v) == 16
-```
-
-3) tests/test_pipeline_windows.py
-```python
-from pipeline.run_pipeline import _generate_windows
-
-def test_generate_quarterly_windows():
-    from datetime import date
-
-    start = date(2024, 1, 1)
-    end = date(2024, 3, 31)
-    windows = _generate_windows(start, end, "quarterly")
-    assert any(w[0].endswith("Q1") for w in windows)
-```
-
-4) tests/test_visualize_plot.py
-```python
-def test_plot_umap_scatter_no_plotly(monkeypatch, tmp_path):
-    # If plotly missing, function should raise ImportError with guidance
-    import analysis.visualize as vis
-
-    try:
-        vis._require_plotly()
-    except ImportError:
-        assert True
-```
-
-## Evidence pointers
- Function docstrings: pipeline/run_pipeline.py, ai_provider.py, analysis/visualize.py, database.py
- DDL: database.py create table blocks
--- a/.mindmodel/constraints/99-stack.yaml
+++ b/.mindmodel/constraints/99-stack.yaml
@ -1,43 +0,0 @@
-# Stack and Dependencies
-
-## Rules
- Primary language: Python >=3.13 (evidence: pyproject.toml requires-python = ">=3.13")
- Application: Streamlit app (streamlit >=1.48.0). Entrypoint: Home.py (CMD: streamlit run Home.py). Evidence: Home.py, pages/1_Stemwijzer.py, pyproject.toml, Dockerfile
- Database: DuckDB + Ibis (duckdb>=1.3.2, ibis-framework[duckdb]>=10.8.0). Evidence: pyproject.toml, database.py
- ML: scikit-learn, umap-learn, scipy. Evidence: pyproject.toml, pipeline/svd.py, analysis/
-
-## Examples
-
-### pyproject dependencies (evidence: pyproject.toml)
-```toml
-dependencies = [
-    "duckdb>=1.3.2",
-    "ibis-framework[duckdb]>=10.8.0",
-    "openai>=1.99.7",
-    "scipy>=1.11",
-    "umap-learn>=0.5",
-    "plotly>=5.0",
-    "pytest>=9.0.2",
-    "requests>=2.32.4",
-    "schedule>=1.2.2",
-    "streamlit>=1.48.0",
-    "scikit-learn>=1.8.0",
-    "beautifulsoup4>=4.14.3",
-    "lxml>=6.0.2",
-]
-```
-
-## Anti-patterns / Notes
- pytest is listed under runtime dependencies in pyproject.toml (line: dependencies). Move pytest to dev-dependencies to avoid shipping test runner in production images. Evidence: pyproject.toml
- Many dependencies use permissive ">=" ranges. Recommend pinning or generating lockfile (poetry.lock/requirements.txt) and adding upper bounds for reproducibility.
- openai appears declared but static imports not found; possible unused dependency (evidence: pyproject.toml, ai_provider.py uses requests and environment keys instead of openai).
-
-## Remediations
- Move test-only libs (pytest) to dev-dependencies in pyproject.toml.
- Add lockfile and CI step to check for pinned dependencies.
- Audit declared but unused packages (openai) and remove or confirm dynamic usage.
-
-## Evidence pointers
- pyproject.toml: full dependency list (lines 1-40)
- Home.py: streamlit usage and app entry (file: Home.py) 
- database.py: duckdb table creation and connection (file: database.py lines ~1-350)
--- a/.mindmodel/constraints/db_connection.yaml
+++ b/.mindmodel/constraints/db_connection.yaml
@ -1,29 +0,0 @@
-# DB connection handling constraints
-
-rules:
-  - name: use_context_managers_for_connections
-    rule: "Prefer using 'with duckdb.connect(path, read_only=...) as conn' for scoped DB interactions where possible."
-    rationale: "Ensures proper resource cleanup and avoids connection leaks."
-
-  - name: read_only_for_compute
-    rule: "Use read_only=True for compute steps that only read data (SVD, similarity compute)."
-    rationale: "Allows safe parallel workers and reduces write contention."
-
-  - name: short_lived_writes
-    rule: "When performing database writes, open short-lived connections, commit quickly and close."
-    rationale: "Avoids long-lived transactions and reduces lock windows."
-
-examples:
-  - path: pipeline/svd_pipeline.py
-    snippet: |
-      conn = duckdb.connect(db_path, read_only=True)
-      try:
-          rows = conn.execute(...).fetchall()
-      finally:
-          conn.close()
-
-anti_patterns_and_remediations:
-  - bad: "Creating a global connection at import that performs migrations."
-    remediation: "Move migrations to an explicit init function that runs at deployment/upgrade time."
-  - bad: "Not closing connections on exceptions."
-    remediation: "Wrap connects in `with` or finally: conn.close() blocks."
--- a/.mindmodel/constraints/error-handling.md
+++ b/.mindmodel/constraints/error-handling.md
@ -0,0 +1,143 @@
+---
+title: Error Handling Patterns
+category: constraints
+severity: high
+---
+
+# Error Handling Patterns
+
+## Core Rules
+
+1. **Catch `Exception`, return safe fallbacks** (False/[]/None)
+2. **Log exceptions with traceback** using `_logger.exception()`
+3. **Never swallow exceptions silently** - always log or return sensible default
+4. **Avoid nested try/except blocks** - flatten exception handling
+
+## Pattern: Try/Except Safe Fallback
+
+This is the dominant pattern in the codebase (219+ instances).
+
+```python
+# Standard pattern from database.py, api_client.py, etc.
+try:
+    result = risky_operation()
+    return process(result)
+except Exception as exc:
+    _logger.warning("Operation failed: %s", exc)
+    return safe_fallback  # False, [], None, {}
+```
+
+### Examples from Codebase
+
+**database.py** - DuckDB operations:
+```python
+def get_svd_vectors(self, window: str):
+    try:
+        conn = duckdb.connect(self.db_path, read_only=True)
+        try:
+            result = conn.execute(query, (window,)).fetchall()
+            return self._parse_vectors(result)
+        finally:
+            conn.close()
+    except Exception as exc:
+        _logger.warning("Failed to get SVD vectors: %s", exc)
+        return []
+```
+
+**ai_provider.py** - HTTP retries:
+```python
+try:
+    resp = requests.post(url, json=json, headers=headers, timeout=10)
+    resp.raise_for_status()
+    return resp.json()
+except requests.ConnectionError as exc:
+    if attempt == retries:
+        raise ProviderError(f"Connection error: {exc}") from exc
+    # ... retry logic
+```
+
+## Pattern: Optional Dependency Fallback
+
+Gracefully degrade when optional packages are unavailable.
+
+```python
+# UMAP fallback in explorer_helpers.py
+try:
+    import umap
+    HAS_UMAP = True
+except ImportError:
+    HAS_UMAP = False
+    _logger.debug("UMAP not available, using SVD vectors directly")
+
+def project_to_2d(vectors):
+    if HAS_UMAP:
+        return umap.UMAP().fit_transform(vectors)
+    return vectors[:, :2]  # Fallback: first 2 SVD dimensions
+```
+
+## Anti-Patterns
+
+### 1. Bare except with pass (CRITICAL)
+**File**: `database.py`, line 47
+
+```python
+# BAD - catches KeyboardInterrupt, SystemExit, MemoryError
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except:  # bare except
+    pass
+```
+
+**Fix**: Catch specific exception or log and continue:
+```python
+try:
+    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
+except Exception as exc:
+    _logger.debug("Sequence creation skipped (may already exist): %s", exc)
+```
+
+### 2. Nested Exception Handling
+**File**: `explorer.py`, lines 244-261
+
+```python
+# BAD - opaque error paths
+try:
+    result = compute_svd(motions)
+except Exception:
+    try:
+        result = fallback_compute(motions)
+    except Exception:
+        pass  # Both exceptions silently dropped
+```
+
+**Fix**: Flatten and handle each case explicitly:
+```python
+# GOOD - explicit handling
+try:
+    result = compute_svd(motions)
+except Exception as exc:
+    _logger.warning("SVD failed, trying fallback: %s", exc)
+    try:
+        result = fallback_compute(motions)
+    except Exception as fallback_exc:
+        _logger.error("Both SVD approaches failed: %s, %s", exc, fallback_exc)
+        raise
+```
+
+## Rule Summary
+
+| Pattern | When to Use | Return Value |
+|---------|-------------|--------------|
+| Safe fallback | Best-effort operations | `[]`, `{}`, `False`, `None` |
+| Re-raise | Critical operations that must succeed | raise |
+| Log and continue | Optional steps in pipeline | (continue) |
+| Graceful degradation | Optional dependencies | Default behavior |
+
+## When to Log vs Return
+
+| Scenario | Action |
+|----------|--------|
+| User action fails | Log warning, return safe default |
+| Internal error (corrupt data) | Log error, return safe default |
+| Transient failure (network) | Log warning, retry if appropriate |
+| Configuration error | Log error, raise with clear message |
--- a/.mindmodel/constraints/error-handling.yaml
+++ b/.mindmodel/constraints/error-handling.yaml
@ -1,184 +0,0 @@
-# Error Handling Constraints
-
-## Core Rule
-
-**Catch `Exception`, return safe fallbacks (False/[]/None)**
-
-Never let exceptions propagate to user-facing code. Always provide a safe default.
-
-## Patterns
-
-### For Not-Found Operations
-
-Return `None` or falsy value when item not found:
-
-```python
-# GOOD: Return None on not found
-def get_motion_by_id(self, motion_id: int) -> Optional[Dict]:
-    try:
-        conn = duckdb.connect(self.db_path)
-        result = conn.execute(
-            "SELECT * FROM motions WHERE id = ?", (motion_id,)
-        ).fetchone()
-        conn.close()
-        return result
-    except Exception:
-        conn.close()
-        return None
-```
-
-### For Collection Operations
-
-Return empty list when no results:
-
-```python
-# GOOD: Return empty list on failure
-def get_filtered_motions(self, **kwargs) -> List[Dict]:
-    try:
-        conn = duckdb.connect(self.db_path)
-        rows = conn.execute(query, params).fetchall()
-        conn.close()
-        return rows
-    except Exception:
-        conn.close()
-        return []
-```
-
-### For Boolean Operations
-
-Return `False` for failed boolean checks:
-
-```python
-# GOOD: Return False on failure
-def motion_exists(self, motion_id: int) -> bool:
-    try:
-        conn = duckdb.connect(self.db_path)
-        count = conn.execute(
-            "SELECT COUNT(*) FROM motions WHERE id = ?", (motion_id,)
-        ).fetchone()[0]
-        conn.close()
-        return count > 0
-    except Exception:
-        return False
-```
-
-### For Creation Operations
-
-Return `False` or empty string on failure:
-
-```python
-# GOOD: Return empty string on failure
-def generate_summary(self, title: str, body: str) -> str:
-    try:
-        return ai_provider.chat_completion(messages)
-    except ai_provider.ProviderError:
-        logger.exception("AI provider failed")
-        return ""
-```
-
-## Anti-Patterns to Avoid
-
-### Don't Catch Specific Exceptions Only
-```python
-# BAD: Catches only FileNotFoundError, misses other issues
-try:
-    with open(path) as f:
-        return json.load(f)
-except FileNotFoundError:
-    return None
-```
-
-### Don't Re-raise Without Context
-```python
-# BAD: Loses information
-try:
-    process(data)
-except Exception:
-    raise  # No context added
-```
-
-### Don't Swallow Exceptions Silently
-```python
-# BAD: No logging, no fallback
-try:
-    return risky_operation()
-except Exception:
-    pass  # What happened?
-```
-
-## Nested Exception Handling
-
-When calling code that has its own error handling, wrap only if needed:
-
-```python
-# Accept result from wrapped function (it handles errors)
-def fetch_motions(self, start_date):
-    # ai_provider_wrapper handles retries internally
-    embeddings = get_embeddings_with_retry(texts)
-    
-    # Only wrap if wrapper doesn't handle errors
-    if all(e is None for e in embeddings):
-        logger.error("All embeddings failed")
-        return []
-    
-    return process(embeddings)
-```
-
-## Context Managers
-
-Use `try/finally` for cleanup:
-
-```python
-def process_with_temp_file(self):
-    temp = NamedTemporaryFile(delete=False)
-    try:
-        temp.write(data)
-        temp.close()
-        return process_file(temp.name)
-    finally:
-        os.unlink(temp.name)
-        temp.close()
-```
-
-## When to Log vs Return
-
-| Scenario | Action |
-|----------|--------|
-| User action fails | Log warning, return safe default |
-| Internal error (corrupt data) | Log error, return safe default |
-| Transient failure (network) | Log warning, retry if appropriate |
-| Configuration error | Log error, raise with clear message |
-
-## Exception Propagation
-
-Only raise exceptions for:
-1. Configuration/setup errors (missing required env vars)
-2. Programming errors (invalid arguments)
-3. Fatal system errors (database corruption)
-
-```python
-# GOOD: Raise for configuration errors
-def _get_api_key(self) -> str:
-    key = os.environ.get("OPENROUTER_API_KEY")
-    if not key:
-        raise ProviderError(
-            "OPENROUTER_API_KEY environment variable is required"
-        )
-    return key
-```
-
-## Logging Errors
-
-Always include context:
-
-```python
-# GOOD: Include relevant context
-_logger.error(
-    "Failed to fetch motion %d: %s", 
-    motion_id, 
-    exc
-)
-
-# BAD: No context
-_logger.error("Failed to fetch")
-```
--- a/.mindmodel/constraints/error_handling.yaml
+++ b/.mindmodel/constraints/error_handling.yaml
@ -1,36 +0,0 @@
-# Error handling style rules (YAML constraint example)
-
-rules:
-  - name: explicit_exceptions
-    rule: "Raise explicit exceptions (ValueError, ProviderError) for known error conditions rather than returning magic values."
-    examples:
-      - good: |
-          if not isinstance(text, str):
-              raise ProviderError('text must be a string')
-      - bad: |
-          if not isinstance(text, str):
-              return []
-
-  - name: avoid_broad_except
-    rule: "Avoid 'except Exception:' that swallows errors. If broad except is used for best-effort, log the exception with logger.exception and re-raise or convert."
-    examples:
-      - bad: |
-          try:
-              do_work()
-          except Exception:
-              return []
-      - remediation: |
-          try:
-              do_work()
-          except SpecificError as exc:
-              logger.warning('Handled error: %s', exc)
-              raise
-
-  - name: logging_over_print
-    rule: "Prefer logger.* over print() for messages and errors."
-    examples:
-      - bad: "print('Error fetching motions from API: %s' % e)"
-      - good: "logger.exception('Error fetching motions from API')"
-
-enforcement_examples:
-  - "Add a static code check to flag 'print(' in modules (except in simple scripts) and 'except Exception:' usages without logger.exception."
--- a/.mindmodel/constraints/logging.yaml
+++ b/.mindmodel/constraints/logging.yaml
@ -1,8 +1,47 @@
+---
+title: Logging Constraints
+category: constraints
+severity: critical
+---
+
 # Logging Constraints

 ## Core Rule

-**Use `logging.getLogger(__name__)` - never use `print()`**
+Use `logging.getLogger(__name__)` - never use `print()`
+
+**CRITICAL ANTI-PATTERN**: `api_client.py` uses `print()` instead of logging (11 instances).
+
+## CRITICAL Anti-Pattern: print() Instead of Logging
+
+**File**: `api_client.py`
+**Evidence**: Lines with `print(f"...")` instead of `_logger.info(...)`
+
+**Broken code**:
+```python
+def get_motions(self, ...):
+    try:
+        # ...
+        print(f"Fetched {len(voting_records)} voting records from API")  # BAD
+        print(f"Processed into {len(motions)} unique motions")  # BAD
+    except Exception as e:
+        print(f"Error fetching motions from API: {e}")  # BAD - no traceback
+```
+
+**Fix**:
+```python
+import logging
+
+_logger = logging.getLogger(__name__)
+
+def get_motions(self, ...):
+    try:
+        _logger.info("Fetched %d voting records from API", len(voting_records))
+        _logger.info("Processed into %d unique motions", len(motions))
+    except Exception as e:
+        _logger.exception("Error fetching motions from API: %s", e)
+        return []
+```

 ## Logger Initialization

@ -31,6 +70,10 @@ _logger = logging.getLogger(__name__)
 _logger = logging.getLogger(__name__)
 ```

+**INCONSISTENCY WARNING**: 16 files use `logger`, 17 files use `_logger`. Choose one convention.
+
+**Recommendation**: Use `_logger` (with underscore) for module-level loggers to distinguish from class-level loggers.
+
 ## Log Levels

 | Level | When to Use |
@ -41,30 +84,6 @@ _logger = logging.getLogger(__name__)
 | ERROR | Operation failed, may need attention |
 | CRITICAL | Fatal error, program may crash |

-## Examples
-
-### Good Logging Practice
-```python
-_logger.info("Pipeline run: %s → %s (%s windows)", start, end, count)
-_logger.debug("Batch embedding attempt %d failed: %s", attempt, exc)
-_logger.warning("Fallback used for motion %d: %s", motion_id, reason)
-_logger.error("Query failed: %s", exc)
-```
-
-### Bad: Using print()
-```python
-# BAD - don't use print
-print(f"Fetched {len(voting_records)} voting records from API")
-print(f"Error fetching motions from API: {e}")
-```
-
-### Good: Using logger
-```python
-# GOOD - use logger
-_logger.info("Fetched %d voting records from API", len(voting_records))
-_logger.error("Error fetching motions from API: %s", e)
-```
-
 ## Exception Logging

 Use `_logger.exception()` for caught exceptions (includes traceback):
@ -77,30 +96,6 @@ except Exception as exc:
    return fallback_value
 ```

-Use `_logger.error()` with explicit exception for controlled errors:
-
-```python
-try:
-    result = risky_operation()
-except Exception as exc:
-    _logger.error("Operation failed: %s", exc)
-    return fallback_value
-```
-
-## Configuration
-
-Ensure logging is configured in entry points:
-
-```python
-# pipeline/run_pipeline.py
-def run(args):
-    logging.basicConfig(
-        level=logging.INFO,
-        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
-    )
-    # ... rest of pipeline
-```
-
 ## Anti-Patterns

 ### Debug Prints in Production Code
@ -117,22 +112,6 @@ _logger.debug("Processing window %s", wid)
 # BAD - mixing _logger and logger
 _logger = logging.getLogger(__name__)
 logger = logging.getLogger("other")  # Inconsistent
-
-# GOOD - use single consistent pattern
-_logger = logging.getLogger(__name__)
-```
-
-### Missing Logger Initialization
-```python
-# BAD - no logger defined
-def some_function():
-    logging.getLogger(__name__).info("...")  # Redundant calls
-
-# GOOD - define once at module level
-_logger = logging.getLogger(__name__)
-
-def some_function():
-    _logger.info("...")
 ```

 ## Sensitive Data
@ -150,18 +129,3 @@ _logger.info("User %s voted %s", user_id, vote)
 # GOOD - log aggregates, not individual votes
 _logger.info("Vote recorded for session %s", session_id[:8])
 ```
-
-## Structured Logging
-
-For complex data, use structured logging:
-
-```python
-_logger.info(
-    "Motion processed",
-    extra={
-        "motion_id": motion_id,
-        "policy_area": policy_area,
-        "processing_time_ms": elapsed_ms,
-    }
-)
-```
--- a/.mindmodel/dependencies/dependencies.md
+++ b/.mindmodel/dependencies/dependencies.md
@ -0,0 +1,92 @@
+---
+title: Dependencies and Library Usage
+category: dependencies
+---
+
+# Dependencies and Library Usage
+
+## Core Dependencies
+
+### duckdb
+- **Required**: Yes
+- **Fallback**: None (core functionality)
+- **Usage**: SQL database for motions, embeddings, SVD vectors
+- **Files**: database.py, analysis/*.py, pipeline/*.py
+
+### streamlit
+- **Required**: Yes
+- **Fallback**: None
+- **Usage**: Web UI framework
+- **Files**: app.py, pages/*.py, explorer.py
+
+### requests
+- **Required**: Yes
+- **Fallback**: None
+- **Usage**: HTTP client for API calls
+- **Files**: api_client.py, ai_provider.py
+
+### plotly
+- **Required**: Yes
+- **Fallback**: None (raises ImportError)
+- **Usage**: Interactive charts for explorer
+- **Files**: explorer.py, explorer_helpers.py
+
+## Optional Dependencies
+
+### umap-learn
+- **Required**: No
+- **Fallback**: Use raw SVD vectors (first 2 dimensions)
+- **Usage**: Dimensionality reduction for visualization
+- **Files**: analysis/clustering.py
+
+### matplotlib
+- **Required**: No
+- **Fallback**: Plotly or raw output
+- **Usage**: Static charting
+- **Files**: Various analysis scripts
+
+## ML Dependencies
+
+### sklearn
+- **Required**: Yes
+- **Usage**: KMeans clustering, cosine_similarity, StandardScaler
+- **Files**: analysis/clustering.py, similarity/compute.py
+
+### scipy
+- **Required**: Yes
+- **Usage**: SVD (scipy.linalg.svd), spatial.procrustes for alignment
+- **Files**: analysis/trajectory.py, pipeline/svd_pipeline.py
+
+### numpy
+- **Required**: Yes
+- **Usage**: Array operations, linear algebra
+- **Files**: Throughout codebase
+
+## Key Imports by File
+
+### explorer.py
+- `import streamlit as st`
+- `from database import db`
+- `from explorer_helpers import *`
+
+### explorer_helpers.py
+- `import pandas as pd`
+- `import plotly.graph_objects as go`
+- `from database import db` (optional, for type hints)
+
+### database.py
+- `import ibis`
+- `import duckdb`
+- `from config import config, PARTY_COLOURS`
+
+### config.py
+- `from dataclasses import dataclass, field`
+- `import streamlit as st` (optional, for warnings)
+
+## Singleton Instances
+
+| Module | Instance | Type |
+|--------|----------|------|
+| `database.py` | `db` | `MotionDatabase` |
+| `config.py` | `config` | `Config` (dataclass) |
+| `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
--- a/.mindmodel/dependencies/dependencies.yaml
+++ b/.mindmodel/dependencies/dependencies.yaml
@ -1,78 +0,0 @@
-# Dependencies
-
-## Core Library Wiring
-
-### Database Layer
-```
-ibis → DuckDB → MotionDatabase singleton (database.py)
-         ↑
-    sqlglot (ibis dependency)
-```
-
-### Data Processing
-```
-pandas → (used throughout for DataFrame operations)
-numpy → (used by sklearn, scipy, umap)
-scipy → spatial.procrustes for window alignment
-```
-
-### ML Pipeline
-```
-sklearn.cluster → KMeans, Procrustes
-sklearn.preprocessing → StandardScaler
-umap → UMAP (optional, graceful fallback)
-```
-
-### Visualization
-```
-plotly → explorer_helpers.py chart builders
-st.plotly_chart → explorer.py rendering
-```
-
-### Streamlit
-```
-streamlit → all pages, @st.cache_data decorators
-```
-
-## Optional Dependencies
-| Package | Required | Fallback |
-|---------|----------|----------|
-| `umap` | No | Use raw SVD vectors (first 2 dims) |
-| `plotly` | Yes | Raises ImportError |
-| `duckdb` | Yes | — |
-| `ibis` | Yes | — |
-| `sklearn` | Yes | — |
-
-## Singleton Instances
-| Module | Instance | Type |
-|--------|----------|------|
-| `database.py` | `db` | `MotionDatabase` |
-| `config.py` | `config` | `Config` (dataclass) |
-| `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
-
-## Key Imports by File
-```
-explorer.py:
-  - import streamlit as st
-  - from database import db
-  - from explorer_helpers import *
-
-explorer_helpers.py:
-  - import pandas as pd
-  - import plotly.graph_objects as go
-  - from database import db (optional, for type hints)
-
-database.py:
-  - import ibis
-  - import duckdb
-  - from config import config, PARTY_COLOURS
-
-config.py:
-  - from dataclasses import dataclass, field
-  - import streamlit as st (optional, for warnings)
-```
-
-## Environment
- Python ≥3.13
- Environment variables via `.env` (DB path, API keys)
- No `.env` values in constraint files (security)
--- a/.mindmodel/domain/domain-glossary.md
+++ b/.mindmodel/domain/domain-glossary.md
@ -0,0 +1,146 @@
+---
+title: Domain Glossary
+category: domain
+---
+
+# Domain Glossary - Dutch Political Terms
+
+## CRITICAL INVARIANTS
+
+> **Rule 1**: Centroid of right-wing parties on RIGHT side of ALL axes
+> - PVV, FVD, JA21, SGP centroid must appear on the RIGHT
+> - Individual right-wing parties may vary slightly from the centroid
+> - This is non-negotiable for any compass/axis visualization
+
+> **Rule 2**: SVD labels are empirically derived from voting data
+> - Labels represent WHAT THE DATA SHOWS, not party self-identification or public opinion
+> - Labels are derived from outliers and 20 representative motions (10 positive, 10 negative)
+> - See SVD Label Derivation section below
+
+---
+
+## SVD Label Derivation
+
+### The Process
+
+SVD (Singular Value Decomposition) finds axes that maximize variance in the MP × Motion voting matrix. To label each axis:
+
+1. **Identify outliers**: Find the two MPs with most extreme positions on that axis
+2. **Select representative motions**: Pick 20 motions where these outliers disagreed most sharply (10 they voted opposite on, 10 where both voted same direction but with other extremes)
+3. **Interpret theme**: Read the motion titles to derive what the axis represents
+4. **Assign label**: Label describes the empirical theme, could be:
+   - Left-Right
+   - Coalition-Opposition
+   - Progressive-Conservative
+   - EU-National sovereignty
+   - Populist-Establishment
+   - Or whatever the voting patterns show
+
+### Example
+
+| Step | Description |
+|------|-------------|
+| Outlier A | Wilders (PVV) - extreme positive on Dim 1 |
+| Outlier B | Marijnissen (SP) - extreme negative on Dim 1 |
+| 20 Motions | Immigration, integration, law & order themes dominate |
+| Label | "Links-Rechts" (Left-Right) |
+
+### Labeling Rules
+
+- **Never use party names in labels** (e.g., not "PVV-SP axis")
+- **Never use semantic/ideological labels** (e.g., not "progressive-conservative" unless that's what the motions show)
+- **Use motion-derived themes** (e.g., "Immigration", "EU", "Economy")
+- **Fallback**: If theme is unclear, use "Axis 1", "Axis 2"
+
+---
+
+## Core Entities
+
+### Motion / Motie
+- Parliamentary motion submitted by MPs
+- Fields: `id`, `title`, `date`, `category`
+- MPs vote: **For** (+1), **Against** (-1), **Abstain** (0), **Absent**
+
+### MP / Kamerlid
+- Member of Parliament (Tweede Kamerlid)
+- Identified by full name (e.g., "Van Dijk, I.")
+- Has voting record, party affiliation, SVD position vector
+
+### Party / Fractie
+- Political party (e.g., "GroenLinks-PvdA", "PVV", "VVD")
+- Party centroids: average SVD position of all MPs in party
+
+### Vote / Stemming
+- Individual MP's vote on a motion: +1, 0, -1
+- Aggregated to compute SVD vectors
+
+---
+
+## Time & Analysis Concepts
+
+### Window / Tijdsvenster
+- Time period for analysis (annual or quarterly)
+- Values: "2023", "2023-Q1", "2024", etc.
+- SVD vectors computed per window
+
+### Trajectory
+- MP's position change across multiple windows
+- Computed from `svd_vectors` + window ordering
+
+---
+
+## Mathematical / Algorithmic Terms
+
+### SVD Vector
+- 2D vector from Singular Value Decomposition of MP × Motion vote matrix
+- Represents MP's position in political space
+
+### SVD Label
+- Empirically derived axis label based on outlier MPs and representative motions
+- Describes the theme of disagreement on that axis
+- NOT based on party ideology or semantic labels
+
+### Political Compass
+- 2D visualization with SVD axes mapped to compass quadrants
+- X-axis: First SVD dimension (labeled from voting data)
+- Y-axis: Second SVD dimension (labeled from voting data)
+
+### Procrustes Alignment
+- Algorithm to align SVD vectors across time windows
+- Ensures comparable positions across years/quarters
+
+### UMAP
+- Uniform Manifold Approximation and Projection
+- Dimensionality reduction for visualization
+- Optional dependency with graceful SVD fallback
+
+---
+
+## Database Table Reference
+
+| Table | Key Fields |
+|-------|-----------|
+| `motions` | id, title, date, category |
+| `mp_votes` | mp_id, motion_id, vote |
+| `svd_vectors` | entity_id, window, vector_2d (list[2]) |
+| `mp_party_history` | mp_id, party, start_date, end_date |
+| `windows` | window_id, start_date, end_date, period_type |
+| `mp_trajectories` | mp_id, window, trajectory_vector |
+
+---
+
+## Dutch Political Parties
+
+### Canonical Right-Wing (centroid on RIGHT of axes)
+- PVV (Partij voor de Vrijheid)
+- FVD (Forum voor Democratie)
+- JA21
+- SGP (Staatkundig Gereformeerde Partij)
+
+### Other Major Parties
+- VVD (Volkspartij voor Vrijheid en Democratie)
+- GL-PvdA (GroenLinks-PvdA)
+- NSC (Nieuw Sociaal Contract)
+- BBB (BoerBurgerBeweging)
+- SP (Socialistische Partij)
+- D66 (Democraten 66)
--- a/.mindmodel/domain/domain-glossary.yaml
+++ b/.mindmodel/domain/domain-glossary.yaml
@ -1,107 +0,0 @@
-# Domain Glossary - Dutch Political Terms
-
-## Core Entities
-
-### Motion / Motie
- Parliamentary motion submitted by MPs
- Fields: `id`, `title`, `date`, `category`
- MPs vote: **For** (+1), **Against** (-1), **Abstain** (0), **Absent**
-
-### MP / Kamerlid
- Member of Parliament (Tweede Kamerlid)
- Identified by full name (e.g., "Van Dijk, I.")
- Has voting record, party affiliation, SVD position vector
- Historical: `mp_party_history` tracks party changes over time
-
-### Party / Fractie
- Political party (e.g., "GroenLinks-PvdA", "PVV", "VVD")
- Party centroids: average SVD position of all MPs in party
- Aliases: multiple spelling variants exist (see anti-patterns.yaml)
-
-### Vote / Stemming
- Individual MP's vote on a motion: +1, 0, -1
- Aggregated to compute SVD vectors
-
---
-
-## Time & Analysis Concepts
-
-### Window / Tijdsvenster
- Time period for analysis (annual or quarterly)
- Values: "2023", "2023-Q1", "2024", etc.
- SVD vectors computed per window
- Windows can be aligned across time using Procrustes
-
-### Trajectory
- MP's position change across multiple windows
- Computed from `svd_vectors` + window ordering
- Used for trend analysis in Evolution tab
-
---
-
-## Mathematical / Algorithmic Terms
-
-### SVD Vector
- 2D vector from Singular Value Decomposition of MP × Motion vote matrix
- Represents MP's position in political space
- `entity_id` in `svd_vectors`: either MP name (when individual MPs) or party name (when party-level)
-
-### Political Compass
- 2D visualization: X-axis = Left↔Right, Y-axis = Progressive↔Conservative
- SVD vectors mapped to compass quadrants
- UMAP used for projection
-
-### Procrustes Alignment
- Algorithm to align SVD vectors across time windows
- Ensures comparable positions across years/quarters
- Implemented via `scipy.spatial.procrustes` or scikit-learn
-
-### Centroid
- Geometric center of a set of points
- Party centroid = average SVD position of all MPs in that party
- Computed from `svd_vectors` filtered by party
-
-### UMAP
- Uniform Manifold Approximation and Projection
- Dimensionality reduction for visualization
- Optional dependency — graceful fallback if unavailable
-
---
-
-## Visualization
-
-### PARTY_COLOURS
- Dict mapping party names to hex color codes
- Used in all Plotly charts for consistent party coloring
- Source: `config.py` → `PARTY_COLOURS` constant
- **Issue**: 3 separate alias dictionaries exist (no single source of truth)
-
---
-
-## Application Pages
-
-### Home
- Landing page with app overview
-
-### Stemwijzer (Quiz)
- User answers questions → matched to parties
- Thin wrapper around quiz module
-
-### Explorer (4 tabs)
- **Motion tab**: SVD positions colored by vote on selected motion
- **MP tab**: Individual MP trajectories across windows
- **Party tab**: Party centroids with members as scatter
- **Evolution tab**: How positions change over time
-
---
-
-## Database Table Reference
-| Table | Key Fields |
-|-------|-----------|
-| `motions` | id, title, date, category |
-| `mp_votes` | mp_id, motion_id, vote |
-| `svd_vectors` | entity_id, window, vector_2d (list[2]) |
-| `party_centroids` | party, window, centroid_2d |
-| `mp_party_history` | mp_id, party, start_date, end_date |
-| `windows` | window_id, start_date, end_date, period_type |
-| `mp_trajectories` | mp_id, window, trajectory_vector |
--- a/.mindmodel/manifest.yaml
+++ b/.mindmodel/manifest.yaml
@ -1,3 +1,7 @@
+# stemwijzer Mind Model - Manifest
+# Generated: 2026-04-12
+# Phase: 2 - Assembly from Phase 1 Analysis
+
 name: stemwijzer
 version: 2
 description: Dutch political voting compass (Stemwijzer) - Mind Model constraints
@ -7,39 +11,54 @@ categories:
  - path: system.md
    description: System overview and architecture summary
    group: docs
-  - path: tech-stack.yaml
+  - path: stack/stack.md
    description: Technology stack with versions and purposes
-    group: docs
-  - path: conventions.yaml
-    description: Coding conventions and style guide
-    group: docs
-  - path: domain.yaml
-    description: Domain entities, terms, and relationships
-    group: docs
+    group: stack
+  - path: domain/domain-glossary.md
+    description: Domain entities, terms, relationships, and CRITICAL INVARIANTS
+    group: domain

  # Design patterns
-  - path: patterns/architecture.yaml
-    description: Repository, Facade, Pipeline architectural patterns
+  - path: patterns/patterns.yaml
+    description: Code patterns (Singleton, Repository, Pipeline, etc.)
    group: patterns
-  - path: patterns/python.yaml
-    description: Python-specific patterns (Singleton, dataclass, context manager)
+  - path: patterns/streamlit.yaml
+    description: Streamlit-specific patterns (session state, cache)
+    group: patterns
+  - path: patterns/api.yaml
+    description: API client patterns with retry and pagination
    group: patterns
  - path: patterns/database.yaml
-    description: DuckDB connection patterns and ORM usage
+    description: DuckDB patterns and connection management
    group: patterns
-  - path: patterns/api.yaml
-    description: API client patterns with retry logic and pagination
+  - path: patterns/python.yaml
+    description: Python-specific patterns (dataclass, typing)
    group: patterns
-  - path: patterns/streamlit.yaml
-    description: Streamlit session state and page patterns
+  - path: patterns/duckdb-access.md
+    description: DuckDB connection patterns and best practices
+    group: patterns
+  - path: patterns/embeddings-similarity.md
+    description: Embeddings and similarity computation patterns
+    group: patterns
+  - path: patterns/error-handling.md
+    description: Error handling and exception patterns
+    group: patterns
+  - path: patterns/module-singletons.md
+    description: Module-level singleton patterns
+    group: patterns
+  - path: patterns/requests-http.md
+    description: HTTP client patterns with retry
+    group: patterns
+  - path: patterns/validation.md
+    description: Input validation patterns
    group: patterns

  # Coding constraints
-  - path: constraints/error-handling.yaml
+  - path: constraints/error-handling.md
    description: Error handling patterns with safe fallbacks
    group: constraints
-  - path: constraints/logging.yaml
-    description: Logging conventions and best practices
+  - path: constraints/logging.md
+    description: Logging conventions
    group: constraints
  - path: constraints/naming.yaml
    description: File, class, function naming rules
@ -50,25 +69,40 @@ categories:
  - path: constraints/types.yaml
    description: Type hint conventions
    group: constraints
+  - path: constraints/testing.yaml
+    description: Testing conventions
+    group: constraints
+
+  # Anti-patterns
+  - path: anti-patterns/anti-patterns.md
+    description: Known anti-patterns with evidence and fixes
+    group: anti-patterns
+
+  # Dependencies
+  - path: dependencies/dependencies.md
+    description: Library usage and singleton instances
+    group: dependencies

  # Code examples
  - path: examples/database-example.py
-    description: MotionDatabase usage example
+    description: MotionDatabase usage examples
    group: examples
  - path: examples/api-client-example.py
-    description: TweedeKamerAPI usage
+    description: TweedeKamerAPI usage examples
    group: examples
  - path: examples/pipeline-example.py
-    description: Pipeline phase example
+    description: Pipeline orchestration examples
    group: examples
  - path: examples/streamlit-page-example.py
-    description: Streamlit page pattern
+    description: Streamlit page patterns
+    group: examples
+  - path: examples/pattern-examples.md
+    description: Consolidated pattern examples
    group: examples

-  # Anti-patterns and workflows
-  - path: anti-patterns.yaml
-    description: Known anti-patterns to avoid
-    group: meta
-  - path: workflows.yaml
-    description: Key workflows (VotingSession, DataIngestion, EmbeddingGeneration)
-    group: meta
+# Phase 1 findings summary:
+# - Tech: Python 3.13+, Streamlit, DuckDB, scipy/sklearn/umap, OpenRouter (QWEN)
+# - 10 patterns discovered: Module singletons, Repository, Service layer, Pipeline
+# - 8 anti-patterns: print() instead of logging, _DummySt global, bare except
+# - 6 code clusters: Database, Streamlit UI, API, Analysis/ML, Config, Singletons
+# - 3 groups: stdlib, 3rd party, local imports
--- a/.mindmodel/patterns/duckdb-access.md
+++ b/.mindmodel/patterns/duckdb-access.md
@ -0,0 +1,79 @@
+---
+title: DuckDB Access Pattern
+category: patterns
+---
+# DuckDB Access Pattern
+
+## Rules
+
+- Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
+- Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
+- If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
+- Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
+
+## Examples
+
+### database.py - Explicit connect/close for schema init
+
+```python
+conn = duckdb.connect(self.db_path)
+...
+conn.execute("""
+    CREATE TABLE IF NOT EXISTS fused_embeddings (
+        id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
+        motion_id INTEGER NOT NULL,
+        window_id TEXT NOT NULL,
+        vector JSON NOT NULL,
+        svd_dims INTEGER NOT NULL,
+        text_dims INTEGER NOT NULL,
+        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+        PRIMARY KEY (id)
+    )
+""")
+conn.close()
+```
+
+### pipeline/svd_pipeline.py - Read-only connection
+
+```python
+conn = duckdb.connect(db_path, read_only=True)
+try:
+    rows = conn.execute(
+        "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
+        (start_date, end_date),
+    ).fetchall()
+finally:
+    conn.close()
+```
+
+### similarity/compute.py - Preferred 'with' context
+
+```python
+try:
+    import duckdb
+except Exception:
+    logger.exception("duckdb import failed; cannot load vectors")
+    return 0
+
+with duckdb.connect(db.db_path) as conn:
+    rows = conn.execute(query, params).fetchall()
+```
+
+## Anti-Patterns
+
+### Bad: Connection without closure
+
+```python
+# BAD: connection may leak if exception occurs before explicit close
+conn = duckdb.connect(db_path)
+rows = conn.execute("SELECT ...").fetchall()
+# missing finally/close
+```
+
+**Remediation**: Use "with" context or ensure conn.close() in finally block.
+
+### Bad: Parallel write connections
+
+**Problem**: Opening write connections from many parallel workers without coordination.
+
+**Remediation**: Open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.
--- a/.mindmodel/patterns/duckdb_access.yaml
+++ b/.mindmodel/patterns/duckdb_access.yaml
@ -1,70 +0,0 @@
-name: duckdb_access
-
-rules:
-  - Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
-  - Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
-  - If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
-  - Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
-
-examples:
-  - path: database.py
-    excerpt: |
-      ```python
-      conn = duckdb.connect(self.db_path)
-      ...
-      conn.execute("""
-          CREATE TABLE IF NOT EXISTS fused_embeddings (
-              id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
-              motion_id INTEGER NOT NULL,
-              window_id TEXT NOT NULL,
-              vector JSON NOT NULL,
-              svd_dims INTEGER NOT NULL,
-              text_dims INTEGER NOT NULL,
-              created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
-              PRIMARY KEY (id)
-          )
-      """)
-      conn.close()
-      ```
-    note: explicit connect/close used when initializing schema
-
-  - path: pipeline/svd_pipeline.py
-    excerpt: |
-      ```python
-      conn = duckdb.connect(db_path, read_only=True)
-      try:
-          rows = conn.execute(
-              "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
-              (start_date, end_date),
-          ).fetchall()
-      finally:
-          conn.close()
-      ```
-    note: read_only connection used for compute-heavy worker
-
-  - path: similarity/compute.py
-    excerpt: |
-      ```python
-      try:
-          import duckdb
-      except Exception:
-          logger.exception("duckdb import failed; cannot load vectors")
-          return 0
-
-      with duckdb.connect(db.db_path) as conn:
-          rows = conn.execute(query, params).fetchall()
-      ```
-    note: preferred 'with' context for automatic close
-
-anti_patterns:
-  - Bad: creating a connection without closure in a long-running process
-    remediation: use "with" context or ensure conn.close() in finally block
-    example: |
-      ```python
-      # BAD: connection may leak if exception occurs before explicit close
-      conn = duckdb.connect(db_path)
-      rows = conn.execute("SELECT ...").fetchall()
-      # missing finally/close
-      ```
-  - Bad: Opening write connections from many parallel workers without coordination
-    remediation: open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.
--- a/.mindmodel/patterns/embeddings-similarity.md
+++ b/.mindmodel/patterns/embeddings-similarity.md
@ -0,0 +1,74 @@
+---
+title: Embeddings Similarity Pipeline
+category: patterns
+---
+# Embeddings Similarity Pipeline
+
+## Rules
+
+- Keep embedding calls batched where possible; fallback to per-item attempts on persistent batch failure.
+- Store raw embeddings, SVD vectors, and fused_embeddings separately; fused_embeddings are typically concatenation [svd + text].
+- Compute similarity as normalized cosine on padded vectors; record top-k neighbors in similarity_cache.
+- Use read_only DuckDB connections in compute workers to allow parallel runs.
+
+## Examples
+
+### pipeline/ai_provider_wrapper.py - Batched embed + fallback
+
+```python
+for start in range(0, len(texts), batch_size):
+    chunk = texts[start : start + batch_size]
+    resp = _post_with_retries("/embeddings", json={"model": model, "input": chunk})
+...
+for j in range(i, end):
+    t = texts[j]
+    single, single_exc = _attempt_batch([t], j)
+    if single:
+        results[j] = single[0]
+```
+
+### pipeline/fusion.py - Concatenation and storage
+
+```python
+try:
+    svd_vec = json.loads(svd_json)
+except Exception:
+    _logger.exception("Invalid SVD vector JSON for entity %s", entity_id)
+    skipped_missing_svd += 1
+    continue
+...
+fused = list(svd_vec) + list(text_vec)
+res = db.store_fused_embedding(
+    int(entity_id),
+    window_id,
+    fused,
+    svd_dims=len(svd_vec),
+    text_dims=len(text_vec),
+)
+```
+
+### similarity/compute.py - Normalized cosine similarity
+
+```python
+# Normalize rows
+norms = np.linalg.norm(matrix, axis=1, keepdims=True)
+norms[norms == 0] = 1.0
+normalized = matrix / norms
+sim = normalized @ normalized.T
+...
+# pick top-k neighbors and write to similarity_cache
+```
+
+## Anti-Patterns
+
+### Bad: Assuming consistent vector length
+
+**Problem**: Assuming consistent vector length without checks leads to shape errors.
+
+**Remediation**: Detect inconsistent lengths, pad with zeros, and log a warning (as seen in compute.py).
+
+### Bad: Inline heavy computation in UI
+
+**Problem**: Recomputing heavy pipelines inline in UI requests.
+
+**Remediation**: Schedule heavy work in scripts/subprocesses and read precomputed results in UI.
--- a/.mindmodel/patterns/embeddings_similarity.yaml
+++ b/.mindmodel/patterns/embeddings_similarity.yaml
@ -1,63 +0,0 @@
-name: embeddings_similarity_pipeline
-
-rules:
-  - Keep embedding calls batched where possible; fallback to per-item attempts on persistent batch failure.
-  - Store raw embeddings, SVD vectors, and fused_embeddings separately; fused_embeddings are typically concatenation [svd + text].
-  - Compute similarity as normalized cosine on padded vectors; record top-k neighbors in similarity_cache.
-  - Use read_only DuckDB connections in compute workers to allow parallel runs.
-
-examples:
-  - path: pipeline/ai_provider_wrapper.py
-    excerpt: |
-      ```python
-      for start in range(0, len(texts), batch_size):
-          chunk = texts[start : start + batch_size]
-          resp = _post_with_retries("/embeddings", json={"model": model, "input": chunk})
-      ...
-      for j in range(i, end):
-          t = texts[j]
-          single, single_exc = _attempt_batch([t], j)
-          if single:
-              results[j] = single[0]
-      ```
-    note: batched embed + fallback per-item retry
-
-  - path: pipeline/fusion.py
-    excerpt: |
-      ```python
-      try:
-          svd_vec = json.loads(svd_json)
-      except Exception:
-          _logger.exception("Invalid SVD vector JSON for entity %s", entity_id)
-          skipped_missing_svd += 1
-          continue
-      ...
-      fused = list(svd_vec) + list(text_vec)
-      res = db.store_fused_embedding(
-          int(entity_id),
-          window_id,
-          fused,
-          svd_dims=len(svd_vec),
-          text_dims=len(text_vec),
-      )
-      ```
-    note: concatenation of vectors and storage via MotionDatabase
-
-  - path: similarity/compute.py
-    excerpt: |
-      ```python
-      # Normalize rows
-      norms = np.linalg.norm(matrix, axis=1, keepdims=True)
-      norms[norms == 0] = 1.0
-      normalized = matrix / norms
-      sim = normalized @ normalized.T
-      ...
-      # pick top-k neighbors and write to similarity_cache
-      ```
-    note: numeric pipeline and padding to consistent dimensionality
-
-anti_patterns:
-  - Bad: Assuming consistent vector length without checks (leads to shape errors).
-    remediation: Detect inconsistent lengths, pad with zeros, and log a warning (as seen in compute.py).
-  - Bad: Recomputing heavy pipelines inline in UI requests.
-    remediation: schedule heavy work in scripts/subprocesses and read precomputed results in UI.
--- a/.mindmodel/patterns/error-handling.md
+++ b/.mindmodel/patterns/error-handling.md
@ -0,0 +1,63 @@
+---
+title: Error Handling Pattern
+category: patterns
+---
+# Error Handling Pattern
+
+## Rules
+
+- Use explicit exceptions for domain/error classification (e.g., ProviderError, ValueError).
+- Prefer logging.exception when catching an exception where stack trace is useful.
+- Avoid broad except: clauses that swallow exceptions; if broad except is used for "best-effort" fallback, log at warning and include original exception context.
+- For public library-like functions, prefer raising typed exceptions instead of returning magic values ([], False) — only return safe defaults where documented.
+
+## Examples
+
+### ai_provider.py - Network error to ProviderError
+
+```python
+except requests.ConnectionError as exc:
+    if attempt == retries:
+        raise ProviderError(
+            f"Connection error when calling provider: {exc}"
+        ) from exc
+    ...
+```
+
+### pipeline/ai_provider_wrapper.py - Best-effort with logging
+
+```python
+except Exception:
+    _logger.exception("Failed to append audit event for embedding failure")
+results[j] = None
+```
+
+### similarity/compute.py - Defensive import handling
+
+```python
+try:
+    import duckdb
+except Exception:
+    logger.exception("duckdb import failed; cannot load vectors")
+    return 0
+```
+
+## Anti-Patterns
+
+### Bad: Silent exception swallowing
+
+```python
+try:
+    do_work()
+except Exception:
+    return []
+# BAD: hides the root cause and returns an ambiguous default
+```
+
+**Remediation**: Narrow exception types or at minimum log.exception() and re-raise or convert to a domain error if truly handled.
+
+### Bad: Mixing print() and logging
+
+**Problem**: Mixing print() and logging for errors.
+
+**Remediation**: Replace print() calls with logger.* calls; use structured logging configuration.
--- a/.mindmodel/patterns/error_handling.yaml
+++ b/.mindmodel/patterns/error_handling.yaml
@ -1,54 +0,0 @@
-name: error_handling
-
-rules:
-  - Use explicit exceptions for domain/error classification (e.g., ProviderError, ValueError).
-  - Prefer logging.exception when catching an exception where stack trace is useful.
-  - Avoid broad except: clauses that swallow exceptions; if broad except is used for "best-effort" fallback, log at warning and include original exception context.
-  - For public library-like functions, prefer raising typed exceptions instead of returning magic values ([], False) — only return safe defaults where documented.
-
-examples:
-  - path: ai_provider.py
-    excerpt: |
-      ```python
-      except requests.ConnectionError as exc:
-          if attempt == retries:
-              raise ProviderError(
-                  f"Connection error when calling provider: {exc}"
-              ) from exc
-          ...
-      ```
-    note: mapping network error to ProviderError with re-raise chaining
-
-  - path: pipeline/ai_provider_wrapper.py
-    excerpt: |
-      ```python
-      except Exception:
-          _logger.exception("Failed to append audit event for embedding failure")
-      results[j] = None
-      ```
-    note: logs and assigns None for failure; fallback behavior documented earlier in wrapper rule
-
-  - path: similarity/compute.py
-    excerpt: |
-      ```python
-      try:
-          import duckdb
-      except Exception:
-          logger.exception("duckdb import failed; cannot load vectors")
-          return 0
-      ```
-    note: defensive import handling and early return on failure
-
-anti_patterns:
-  - Bad: Broad except without logging and without re-raising (silently hides bugs)
-    remediation: Narrow exception types or at minimum log.exception() and re-raise or convert to a domain error if truly handled.
-    example: |
-      ```python
-      try:
-          do_work()
-      except Exception:
-          return []
-      # BAD: hides the root cause and returns an ambiguous default
-      ```
-  - Bad: Mixing print() and logging for errors
-    remediation: Replace print() calls with logger.* calls; use structured logging configuration.
--- a/.mindmodel/patterns/module-singletons.md
+++ b/.mindmodel/patterns/module-singletons.md
@ -0,0 +1,41 @@
+---
+title: Module Singletons Pattern
+category: patterns
+---
+# Module Singletons Pattern
+
+## Rules
+
+- Module-level singletons (e.g., db = MotionDatabase()) are acceptable but should be created carefully:
+  - Avoid expensive initialization at import time.
+  - Provide a way to construct with a test DB path or to reinitialize in tests.
+- If a singleton holds resources (DB connections, sessions), ensure safe shutdown on program exit.
+
+## Examples
+
+### database.py - Safe class initialization
+
+```python
+class MotionDatabase:
+    def __init__(self, db_path: str = config.DATABASE_PATH):
+        self.db_path = db_path
+        # If duckdb is not available, operate in lightweight file-backed mode
+        self._file_mode = duckdb is None
+        self._init_database()
+```
+
+### similarity/lookup.py - Local instances
+
+```python
+db = MotionDatabase(db_path=db_path) if db_path else MotionDatabase()
+if hasattr(db, "get_cached_similarities"):
+    rows = db.get_cached_similarities(...)
+```
+
+## Anti-Patterns
+
+### Bad: Heavy initialization at import time
+
+**Problem**: Creating connections and performing heavy schema migrations during import.
+
+**Remediation**: Move heavy init to an explicit initialize() method and keep import fast.
--- a/.mindmodel/patterns/module_singletons.yaml
+++ b/.mindmodel/patterns/module_singletons.yaml
@ -1,33 +0,0 @@
-name: module_singletons
-
-rules:
-  - Module-level singletons (e.g., db = MotionDatabase()) are acceptable but should be created carefully:
-    - Avoid expensive initialization at import time.
-    - Provide a way to construct with a test DB path or to reinitialize in tests.
-  - If a singleton holds resources (DB connections, sessions), ensure safe shutdown on program exit.
-
-examples:
-  - path: database.py
-    excerpt: |
-      ```python
-      class MotionDatabase:
-          def __init__(self, db_path: str = config.DATABASE_PATH):
-              self.db_path = db_path
-              # If duckdb is not available, operate in lightweight file-backed mode
-              self._file_mode = duckdb is None
-              self._init_database()
-      ```
-    note: class is safe to instantiate and creates DB at init; consider lazy init if heavy
-
-  - path: similarity/lookup.py
-    excerpt: |
-      ```python
-      db = MotionDatabase(db_path=db_path) if db_path else MotionDatabase()
-      if hasattr(db, "get_cached_similarities"):
-          rows = db.get_cached_similarities(...)
-      ```
-    note: consumers create local MotionDatabase instances, not relying on a single global
-
-anti_patterns:
-  - Bad: Creating connections and performing heavy schema migrations during import
-    remediation: Move heavy init to an explicit initialize() method and keep import fast.
--- a/.mindmodel/patterns/requests-http.md
+++ b/.mindmodel/patterns/requests-http.md
@ -0,0 +1,77 @@
+---
+title: Requests HTTP Pattern
+category: patterns
+---
+# Requests HTTP Pattern
+
+## Rules
+
+- Reuse requests.Session when making multiple calls to the same host to benefit from connection pooling.
+- Wrap outbound HTTP calls with retry/backoff logic and respect Retry-After on 429.
+- Treat 5xx as transient and retry; surface 4xx as configuration/client errors (do not retry unless 429).
+- Raise or wrap non-OK responses into domain ProviderError to make behavior consistent across the codebase.
+
+## Examples
+
+### ai_provider.py - 429 handling with Retry-After
+
+```python
+resp = requests.post(url, json=json, headers=headers, timeout=10)
+...
+if getattr(resp, "status_code", 0) == 429:
+    if attempt == retries:
+        raise ProviderError(f"Provider returned HTTP {resp.status_code}")
+    retry_after = None
+    raw = resp.headers.get("Retry-After") if getattr(resp, "headers", None) else None
+    if raw:
+        try:
+            retry_after = int(raw)
+        except Exception:
+            ...
+    if retry_after is not None:
+        time.sleep(retry_after)
+        continue
+```
+
+### api_client.py - Session + raise_for_status
+
+```python
+response = self.session.get(
+    base_url, params=params, timeout=config.API_TIMEOUT
+)
+response.raise_for_status()
+data = response.json()
+```
+
+### pipeline/ai_provider_wrapper.py - Retry/backoff wrapper
+
+```python
+def _attempt_batch(chunk_texts, start_index):
+    backoff = 0.5
+    for attempt in range(1, retries + 1):
+        try:
+            emb_chunk = _embedder(
+                chunk_texts, model=model, batch_size=len(chunk_texts)
+            )
+            return emb_chunk, None
+        except Exception as exc:
+            if attempt == retries:
+                break
+            sleep = backoff * (2 ** (attempt - 1))
+            time.sleep(sleep)
+            continue
+```
+
+## Anti-Patterns
+
+### Bad: Silent exception swallowing
+
+**Problem**: Blindly catching all requests exceptions and returning empty response.
+
+**Remediation**: Map network exceptions to retryable vs terminal (ProviderError) and log details.
+
+### Bad: Using print() for errors
+
+**Problem**: Using print() for network errors instead of structured logging.
+
+**Remediation**: Use `_logger.exception()` instead (see api_client.py needs fixing).
--- a/.mindmodel/patterns/requests_http.yaml
+++ b/.mindmodel/patterns/requests_http.yaml
@ -1,65 +0,0 @@
-name: requests_http
-
-rules:
-  - Reuse requests.Session when making multiple calls to the same host to benefit from connection pooling.
-  - Wrap outbound HTTP calls with retry/backoff logic and respect Retry-After on 429.
-  - Treat 5xx as transient and retry; surface 4xx as configuration/client errors (do not retry unless 429).
-  - Raise or wrap non-OK responses into domain ProviderError to make behavior consistent across the codebase.
-
-examples:
-  - path: ai_provider.py
-    excerpt: |
-      ```python
-      resp = requests.post(url, json=json, headers=headers, timeout=10)
-      ...
-      if getattr(resp, "status_code", 0) == 429:
-          if attempt == retries:
-              raise ProviderError(f"Provider returned HTTP {resp.status_code}")
-          retry_after = None
-          raw = resp.headers.get("Retry-After") if getattr(resp, "headers", None) else None
-          if raw:
-              try:
-                  retry_after = int(raw)
-              except Exception:
-                  ...
-          if retry_after is not None:
-              time.sleep(retry_after)
-              continue
-      ```
-    note: explicit handling of 429 and Retry-After
-
-  - path: api_client.py
-    excerpt: |
-      ```python
-      response = self.session.get(
-          base_url, params=params, timeout=config.API_TIMEOUT
-      )
-      response.raise_for_status()
-      data = response.json()
-      ```
-    note: uses session + raise_for_status() to surface HTTP errors
-
-  - path: pipeline/ai_provider_wrapper.py
-    excerpt: |
-      ```python
-      def _attempt_batch(chunk_texts, start_index):
-          backoff = 0.5
-          for attempt in range(1, retries + 1):
-              try:
-                  emb_chunk = _embedder(
-                      chunk_texts, model=model, batch_size=len(chunk_texts)
-                  )
-                  return emb_chunk, None
-              except Exception as exc:
-                  if attempt == retries:
-                      break
-                  sleep = backoff * (2 ** (attempt - 1))
-                  time.sleep(sleep)
-                  continue
-      ```
-    note: wrapper adds retry/backoff and per-item fallback
-
-anti_patterns:
-  - Bad: Blindly catching all requests exceptions and returning empty response
-    remediation: map network exceptions to retryable vs terminal (ProviderError) and log details.
-  - Bad: Using print() for network errors instead of structured logging (see api_client.py where print() is used; prefer logging).
--- a/.mindmodel/patterns/validation.md
+++ b/.mindmodel/patterns/validation.md
@ -0,0 +1,37 @@
+---
+title: Validation Pattern
+category: patterns
+---
+# Validation Pattern
+
+## Rules
+
+- Validate inputs early and raise ValueError or domain-specific exceptions (ProviderError) for invalid contract inputs.
+- Tests should assert that invalid inputs raise the expected exceptions.
+- Use explicit checks for types and shapes on public APIs (e.g., ensure text is str before embedding).
+
+## Examples
+
+### ai_provider.py - Type validation
+
+```python
+if not isinstance(text, str):
+    raise ProviderError("text must be a string")
+```
+
+### pipeline/ai_provider_wrapper.py - Defensive empty handling
+
+```python
+if not texts:
+    return []
+if motion_ids is None:
+    motion_ids = [None for _ in texts]
+```
+
+## Anti-Patterns
+
+### Bad: Invalid values into computation
+
+**Problem**: Allowing invalid values to propagate into heavy computation (e.g., non-string into embedding pipeline).
+
+**Remediation**: Fail fast with a typed exception and add unit tests to cover validations.
--- a/.mindmodel/patterns/validation.yaml
+++ b/.mindmodel/patterns/validation.yaml
@ -1,29 +0,0 @@
-name: validation
-
-rules:
-  - Validate inputs early and raise ValueError or domain-specific exceptions (ProviderError) for invalid contract inputs.
-  - Tests should assert that invalid inputs raise the expected exceptions.
-  - Use explicit checks for types and shapes on public APIs (e.g., ensure text is str before embedding).
-
-examples:
-  - path: ai_provider.py
-    excerpt: |
-      ```python
-      if not isinstance(text, str):
-          raise ProviderError("text must be a string")
-      ```
-    note: explicit type validation before network call
-
-  - path: pipeline/ai_provider_wrapper.py
-    excerpt: |
-      ```python
-      if not texts:
-          return []
-      if motion_ids is None:
-          motion_ids = [None for _ in texts]
-      ```
-    note: defensive handling of empty inputs
-
-anti_patterns:
-  - Bad: Allowing invalid values to propagate into heavy computation (e.g., non-string into embedding pipeline).
-    remediation: Fail fast with a typed exception and add unit tests to cover validations.
--- a/.mindmodel/stack/stack.md
+++ b/.mindmodel/stack/stack.md
@ -0,0 +1,67 @@
+---
+title: Tech Stack
+category: stack
+---
+
+# Tech Stack
+
+## Runtime & Language
+- **Python >=3.13**
+
+## Web Framework
+- **Streamlit** - Multi-page app with Home, Stemwijzer, Explorer pages
+
+## Data Layer
+- **DuckDB** - Embedded OLAP database
+  - Tables: motions, mp_votes, svd_vectors, fused_embeddings, embeddings, user_sessions, party_results, mp_metadata
+- **ibis** - ORM (referenced but DuckDB-native implementation used)
+
+## AI / LLM
+- **OpenRouter** - API abstraction for AI providers
+- **QWEN** - Primary model
+  - Embeddings: `qwen/qwen3-embedding-4b`
+  - Chat: `qwen/qwen-2.5-72b-instruct`
+- **requests** - HTTP client (not raw openai)
+
+## ML / Analytics
+- **scikit-learn** - KMeans clustering, cosine_similarity, StandardScaler
+- **scipy** - SVD (scipy.linalg.svd), spatial.procrustes
+- **umap-learn** - Dimensionality reduction (optional, graceful fallback to SVD)
+- **numpy** - Numerical computing
+
+## Visualization
+- **Plotly** - Interactive charts (go.Figure, _DummyTrace fallback)
+- **matplotlib** - Static plotting (optional)
+
+## HTTP & Parsing
+- **requests** - Session pooling, retry with backoff
+- **beautifulsoup4** - HTML parsing
+- **lxml** - XML/HTML processing
+
+## Key Source Files
+
+| File | Purpose |
+|------|---------|
+| `database.py` | MotionDatabase singleton, DuckDB connection, 9-table schema |
+| `explorer.py` | Explorer page with 4 tabs (Motion, MP, Party, Evolution) |
+| `explorer_helpers.py` | Pure helper functions, Plotly chart builders |
+| `analysis/` | SVD pipeline, UMAP projection, clustering |
+| `pipeline/` | Data fetch, transform, store pipeline |
+| `pages/1_Stemwijzer.py` | Quiz page |
+| `pages/2_Explorer.py` | Explorer page |
+| `config.py` | Dataclass Config pattern |
+| `ai_provider.py` | OpenRouter API wrapper with retry |
+| `api_client.py` | TweedeKamer OData API client |
+
+## Singleton Instances
+
+| Module | Instance | Type |
+|--------|----------|------|
+| `database.py` | `db` | `MotionDatabase` |
+| `config.py` | `config` | `Config` (dataclass) |
+| `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
+
+## Environment
+- Python >=3.13
+- Environment variables via `.env` (DB path, API keys)
+- No `.env` values in constraint files (security)
--- a/.mindmodel/stack/stack.yaml
+++ b/.mindmodel/stack/stack.yaml
@ -1,41 +0,0 @@
-# Tech Stack
-
-## Runtime & Language
- **Python ≥3.13** (type: runtime)
- Streamlit (type: web framework) - multi-page app: Home, Stemwijzer, Explorer (4 tabs)
-
-## Data Layer
- **DuckDB** (type: database) - 9 tables: motions, mp_votes, svd_vectors, mp_party_history, etc.
- **ibis** (type: ORM) - DuckDB backend for Pythonic SQL
- Query mode: duckdb:// path or :memory: (see database.py:50-51)
-
-## ML / Analytics
- **scikit-learn** (type: ML) - clustering, Procrustes alignment
- **UMAP** (type: dimensionality reduction) - 2D political compass projection
- **scipy** (type: scientific computing) - spatial/alignment algorithms
- **numpy** (type: numerical computing) - array operations
-
-## Visualization
- **Plotly** (type: charting) - dual-layer interactive charts (scatter + annotations)
-
-## Key Source Files
-| File | Purpose |
-|------|---------|
-| `database.py` | MotionDatabase singleton, DuckDB connection, 9-table schema |
-| `explorer.py` | Explorer page with 4 tabs (Motion, MP, Party, Evolution) |
-| `explorer_helpers.py` | Pure helper functions, Plotly chart builders, coordinate computation |
-| `analysis/` | SVD pipeline, UMAP projection, clustering algorithms |
-| `pipeline/` | Data fetch → transform → store pipeline |
-| `pages/1_🗳️_Stemwijzer.py` | Quiz page (thin wrapper) |
-| `pages/2_🔍_Explorer.py` | Explorer page (thin wrapper) |
-| `config.py` | Dataclass Config pattern |
-
-## Database Tables
- `motions` - parliamentary motions with id, title, date, category
- `mp_votes` - individual MP votes on motions (1/0/-1)
- `svd_vectors` - SVD-computed political positions (entity_id, window, vector_2d)
- `mp_party_history` - MP-to-party mappings over time
- `party_centroids` - aggregated party positions
- `windows` - time period definitions
- `mp_trajectories` - MP position changes across windows
- Plus 2 additional tables (exact names vary)
--- a/.mindmodel/system.md
+++ b/.mindmodel/system.md
@ -21,7 +21,7 @@ TweedeKamer OData API
        ├── text_pipeline         # AI embeddings via OpenRouter
        └── fusion                # Combine SVD + text vectors
        ↓
-  Streamlit Web App (app.py, pages/)
+  Streamlit Web App (Home.py, pages/)
        ├── Home.py               # Landing page
        ├── 1_Stemwijzer.py       # Voting quiz
        └── 2_Explorer.py        # Political compass explorer
@ -36,34 +36,53 @@ TweedeKamer OData API
 | **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` |
 | **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` |
 | **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` |
-| **Similarity** | Motion similarity search | `similarity/*.py` |
-| **Web App** | Streamlit UI | `app.py`, `pages/*.py` |
-
-### Data Models
-
-**Core Entities**:
- `Motion`: Parliamentary motion with voting results
- `MP` / `MPMetadata`: Member of Parliament with party/tenure
- `MPVote`: Individual vote record (Voor/Tegen/Onthouden/Geen stem/Afwezig)
- `Party`: Political party
- `UserSession` / `UserVote`: Voting session tracking
- `SVDVector`: Dimensionality-reduced vote vectors
- `FusedEmbedding`: Combined SVD + text embedding
- `SimilarityCache`: Pre-computed motion similarities
-
-### Technical Decisions
-
-1. **DuckDB over SQLite**: Chosen for OLAP performance with complex analytical queries
-2. **ibis ORM**: Database-agnostic query building (currently using DuckDB backend)
-3. **SVD + Procrustes**: Aligns voting vectors across time windows
-4. **UMAP for visualization**: Non-linear dimensionality reduction for compass display
-5. **OpenRouter API**: Abstraction layer for AI embeddings (currently using Qwen)
-6. **Module-level singletons**: `db = MotionDatabase()` pattern for shared state
-
-### Key Conventions
-
- **DuckDB connections**: Short-lived per method, always close
- **Error handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
- **Logging**: Use `logging.getLogger(__name__)` - avoid print()
- **Type hints**: Required on public functions with typing module imports
- **Config**: Dataclass `Config` in `config.py`, accessed as `from config import config`
+| **Explorer Helpers** | Pure functions, chart builders | `explorer_helpers.py` |
+| **Web App** | Streamlit UI | `Home.py`, `pages/*.py` |
+
+### Tech Stack
+
+- **Language**: Python 3.13+
+- **Web Framework**: Streamlit (multi-page app)
+- **Database**: DuckDB with ibis ORM (DuckDB-native implementation)
+- **ML/Analytics**: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional)
+- **AI/LLM**: OpenRouter-compatible API (QWEN embeddings + chat)
+- **Visualization**: Plotly (interactive charts), matplotlib (optional)
+- **HTTP**: requests with Session pooling and retry
+- **Parsing**: beautifulsoup4, lxml
+
+### Key Patterns
+
+1. **Module-Level Singletons**: `db = MotionDatabase()`, `config = Config()`
+2. **Repository Pattern**: MotionDatabase class with method-per-query
+3. **Service Layer**: TweedeKamerAPI, ai_provider with retry/backoff
+4. **Pipeline Orchestration**: ThreadPoolExecutor for parallel SVD
+5. **Short-Lived Connections**: DuckDB connections in try/finally blocks
+6. **Graceful Degradation**: try/except around optional dependencies
+
+### Domain Invariants
+
+⚠️ **CRITICAL RULES** (from AGENTS.md):
+
+1. **Right-wing parties on RIGHT**: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations
+2. **SVD labels = voting patterns**: SVD labels reflect voting patterns, NOT semantic content
+
+### Database Tables
+
+| Table | Purpose |
+|-------|---------|
+| `motions` | Parliamentary motions with id, title, date, category |
+| `mp_votes` | Individual MP votes on motions (Voor/Tegen/Onthouden) |
+| `mp_metadata` | MP names, parties, tenure info |
+| `svd_vectors` | 2D SVD-computed political positions per entity |
+| `fused_embeddings` | Combined SVD + text embeddings |
+| `embeddings` | Text embeddings for motions |
+| `user_sessions` | Voting session tracking |
+| `party_results` | Party match results per session |
+
+### Conventions
+
+- **Error Handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
+- **Logging**: Use `logging.getLogger(__name__)` — **never use print()**
+- **Imports**: stdlib → 3rd party → local (3 groups)
+- **Type Hints**: Required on public functions with typing module imports
+- **DuckDB**: Short-lived connections with try/finally conn.close()