cleanup: remove stale .mindmodel, old venvs, orphaned code, and transient artifacts

Removes: - .mindmodel/ directory and related CI workflows (mindmodel-schedule.yml, mindmodel-validation.yml) - scripts/mindmodel/ and scripts/validate_mindmodel.py - src/types/ and src/validators/ (orphaned type modules, only used by mindmodel) - tests/ci/, tests/scripts/mindmodel/, tests/types/, tests/validators/ (mindmodel-only tests) - thoughts/ledgers/ and thoughts/shared/ (stale transient directories) - .venv_axis and .venv_plotly (orphaned virtual environments, ~1.1 GB) - outputs/blog-charts/ (stale generated HTML files) - data/*.json sidecars (empty cache artifacts) - __pycache__ and *.pyc files across repo Updates: - .gitignore: remove thoughts/shared/analyses/ entry Space reclaimed: ~1.1 GB+
3 months ago · 07dd393533
parent 6e36fa2604
commit 07dd393533
58 changed files with 11 additions and 5622 deletions
--- a/.github/workflows/mindmodel-schedule.yml
+++ b/.github/workflows/mindmodel-schedule.yml
@ -1,37 +0,0 @@
 name: mindmodel scheduled validate
 on:
  schedule:
    - cron: '0 0 * * 0' # weekly
 jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install uv
        uses: astral-sh/setup-uv@v5
        with:
          version: "0.6.x"
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.13"
      - name: Install dependencies
        run: uv sync --locked
      - name: Run tests
        run: uv run pytest tests/ -q
      - name: Run mindmodel validator if manifest exists
        if: ${{ always() }}
        run: |
          if [ -f .mindmodel/manifest.yaml ]; then
            uv run python -m scripts.mindmodel.cli || true
          else
            echo "No .mindmodel/manifest.yaml present — skipping validator"
          fi
--- a/.github/workflows/mindmodel-validation.yml
+++ b/.github/workflows/mindmodel-validation.yml
@ -1,47 +0,0 @@
 name: mindmodel validation
 on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
 jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.x'
      - name: Install development dependencies (if present)
        run: |
          python -m pip install --upgrade pip
          if [ -f requirements-dev.txt ]; then
            pip install -r requirements-dev.txt
          else
            echo "requirements-dev.txt not found, skipping"
          fi
      - name: Run mindmodel validator (report-only)
        if: ${{ always() }}
        run: |
          # Make this step report-only: run the validator but always exit 0 so PRs are not blocked
          set +e
          if [ -f .mindmodel/manifest.yaml ]; then
            python scripts/validate_mindmodel.py --manifest .mindmodel/manifest.yaml --report reports/out.json || true
          else
            echo "No .mindmodel/manifest.yaml present — skipping validator"
          fi
          exit 0
      - name: Upload mindmodel reports
        if: ${{ always() }}
        uses: actions/upload-artifact@v4
        with:
          name: mindmodel-reports
          path: reports/mindmodel-report-*.json
--- a/.gitignore
+++ b/.gitignore
@ -29,7 +29,6 @@ dummy
 # Generated analysis files
 thoughts/explorer/*.json
 thoughts/explorer/*_report.md
 thoughts/shared/analyses/
 # Compound Engineering local config
 .compound-engineering/*.local.yaml
--- a/.mindmodel/README.md
+++ b/.mindmodel/README.md
@ -1,11 +0,0 @@
 # .mindmodel
 This directory contains a generated, read-only snapshot of the repository's "mind model" — structured metadata and evidence used by tooling to reason about repository intent, patterns, and decisions.
 Guidelines
 - Read-only: Treat files in this directory as generated artifacts. Local tooling or CI may regenerate or validate them; avoid manual edits unless you are intentionally updating the generator.
 - No secrets: Do not place any credentials, tokens, or sensitive data here. The validator that consumes this folder is designed to detect common secret patterns and will fail if secrets are found.
 - Safe to read: Tools and CI may read these files. They must avoid opening or parsing arbitrary repository secrets and should operate in read-only mode.
 - Validation: CI workflows will run a validator against this folder (if present) to ensure manifest shape, evidence snippets, and referenced files meet project rules.
 If you need to propose a change to the mind model, open a PR describing the intent and the generator changes. The CI validator will validate the submitted artifact before merge.
--- a/.mindmodel/anti-patterns/anti-patterns.md
+++ b/.mindmodel/anti-patterns/anti-patterns.md
@ -1,127 +0,0 @@
 ---
 title: Anti-Patterns in Stemwijzer
 category: anti-patterns
 severity: critical
 ---
 # Anti-Patterns
 > **NOTE**: Some anti-patterns below were investigated and found to be resolved or invalid. See individual entries for details.
 ## CRITICAL: print() Instead of Logging
 **File**: `api_client.py`
 **Evidence**: 11 instances of `print(f"...")` instead of `_logger.info(...)`
 **Broken code**:
 ```python
 def get_motions(self, ...):
    try:
        # ...
        print(f"Fetched {len(voting_records)} voting records from API")  # BAD
        print(f"Processed into {len(motions)} unique motions")  # BAD
    except Exception as e:
        print(f"Error fetching motions from API: {e}")  # BAD - no traceback
 ```
 **Fix**:
 ```python
 import logging
 _logger = logging.getLogger(__name__)
 def get_motions(self, ...):
    try:
        _logger.info("Fetched %d voting records from API", len(voting_records))
        _logger.info("Processed into %d unique motions", len(motions))
    except Exception as e:
        _logger.exception("Error fetching motions from API: %s", e)
        return []
 ```
 ---
 ## CRITICAL: Global `_DummySt` Replacement
 **File**: `explorer.py`
 **Evidence**: Lines ~50-70, module-level `st = _DummySt()` global replacement
 **Problem**: Creates a module-level variable `st` that shadows `streamlit` module, causing subtle bugs.
 **Fix**: Use conditional flags instead of global replacement:
 ```python
 # GOOD: Use conditional logic
 try:
    import plotly.express as px
    import plotly.graph_objects as go
    HAS_PLOTLY = True
 except ImportError:
    HAS_PLOTLY = False
    px = None
    go = None
 def render_chart(data):
    if not HAS_PLOTLY:
        _logger.warning("Plotly not available")
        return
    # ... rest of chart logic
 ```
 ---
 ## WARNING: Logger Naming Inconsistency
 **Evidence**: 16 files use `logger`, 17 files use `_logger`
 **Files with `logger`** (without underscore):
 - api_client.py, ai_provider.py, pipeline files, analysis files
 **Files with `_logger`** (with underscore):
 - database.py, explorer.py, explorer_helpers.py
 **Recommendation**: Standardize on `_logger` for module-level loggers.
 ---
 ## WARNING: Bare except with pass
 **File**: `database.py`, line 47
 ```python
 # BAD - catches KeyboardInterrupt, SystemExit, MemoryError
 try:
    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
 except:  # bare except
    pass
 ```
 **Fix**:
 ```python
 try:
    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
 except Exception as exc:
    _logger.debug("Sequence creation skipped: %s", exc)
 ```
 ---
 ## INVESTIGATED: Entity-ID / Party-Name Mismatch
 **Status**: INVALID - investigated and resolved
 **Investigation Summary**: `svd_vectors.entity_id` only contains MP names (not party names). Party centroids are correctly computed via `mp_metadata` lookups. No production bug exists.
 ---
 ## Pattern: Three Separate Party Alias Dictionaries
 **Problem**: Party name variations exist in 3+ places with no canonical alias mapping.
 **Fix**: Create one `PARTY_ALIASES` dict in `config.py`:
 ```python
 PARTY_ALIASES = {
    "GroenLinks-PvdA": ["GL-PvdA", "GroenLinks PvdA", "PvdA-GroenLinks"],
    "PVV": ["Partij voor de Vrijheid"],
    # ...
 }
 ```
--- a/.mindmodel/architecture/architecture.yaml
+++ b/.mindmodel/architecture/architecture.yaml
@ -1,55 +0,0 @@
 # Architecture
 ## Page Routing
 - `Home.py` → thin wrapper, minimal logic
 - `pages/1_🗳️_Stemwijzer.py` → thin wrapper delegating to quiz module
 - `pages/2_🔍_Explorer.py` → thin wrapper delegating to `explorer.py`
 - **Pattern**: thin Streamlit page files that import and call into core modules
 ## Core Modules
 ```
 database.py          → MotionDatabase singleton (shared across all pages)
 explorer.py          → Explorer page logic, tab routing
 explorer_helpers.py  → Pure functions, chart builders, coordinate computation
 analysis/            → SVD, UMAP, clustering algorithms
 pipeline/            → Data ingestion pipeline
 config.py            → Dataclass Config, PARTY_COLOURS dict
 ```
 ## Data Flow
 ```
 DuckDB → MotionDatabase (singleton)
              ↓
        st.cache_data loaders
              ↓
        explorer_helpers (pure functions)
              ↓
        Plotly charts → Streamlit
 ```
 ## Key Patterns
 1. **Singleton per module**: `database.py` exports one `db` instance; `config.py` exports config + PARTY_COLOURS
 2. **Graceful degradation**: try/except around optional dependencies (UMAP, Plotly)
 3. **Pipeline**: fetch → transform → store (see `pipeline/` directory)
 4. **API client**: with retry/backoff for external data sources
 5. **Dummy fallbacks**: if optional dep unavailable, use dummy stub
 ## Database Schema (key relationships)
 ```
 motions (id, title, date, category)
  ↓
 mp_votes (mp_id, motion_id, vote: -1/0/1)
  ↓
 svd_vectors (entity_id, window, vector_2d)  ← entity_id = mp_name OR party_name
  ↓
 party_centroids (party, window, centroid_2d)
  ↓
 mp_party_history (mp_id, party, start_date, end_date)
 ```
 ## SVD Computation Pipeline
 1. Build MP × Motion vote matrix from `mp_votes`
 2. Run SVD to get 2D embeddings per MP
 3. Optionally aggregate to party centroids
 4. Align across windows using Procrustes
 5. Store in `svd_vectors` table
--- a/.mindmodel/constraints/README.md
+++ b/.mindmodel/constraints/README.md
@ -1,51 +0,0 @@
 # Constraint Files Index
 This directory contains all constraint files for the Stemwijzer codebase.
 ## Quick Navigation
 | Category | File | Purpose |
 |----------|------|---------|
 | **Stack** | `../stack/stack.yaml` | Tech stack overview |
 | **Architecture** | `../architecture/architecture.yaml` | Data flow, page routing, component relationships |
 | **Conventions** | `../conventions/conventions.yaml` | Naming, error handling, code organization |
 | **Domain** | `../domain/domain-glossary.yaml` | Dutch political terms, algorithm concepts |
 | **Patterns** | `../patterns/patterns.yaml` | 10 code patterns (page wrapper, pipeline, etc.) |
 | **Anti-Patterns** | `../anti-patterns/anti-patterns.yaml` | ⚠️ 7 issues including CRITICAL BUG |
 | **Dependencies** | `../dependencies/dependencies.yaml` | Library wiring, singletons, imports |
 ## How to Use
 1. **Before writing code**: Check `patterns/patterns.yaml` for how similar features are implemented
 2. **When naming things**: Follow `conventions/conventions.yaml` (snake_case functions, PascalCase classes)
 3. **When handling errors**: Avoid patterns in `anti-patterns/anti-patterns.yaml`
 4. **When working with domain terms**: Reference `domain/domain-glossary.yaml`
 5. **When connecting components**: See `dependencies/dependencies.yaml` for wiring
 ## Key Conventions Summary
 - **Files**: snake_case (`explorer_helpers.py`)
 - **Functions**: snake_case (`compute_party_coords`)
 - **Classes**: PascalCase (`MotionDatabase`)
 - **Constants**: UPPER_SNAKE_CASE (`PARTY_COLOURS`)
 - **No bare `except:`** — always specify exception type
 - **Pure functions** in helpers — no IO, no Streamlit calls
 - **One singleton per module** — `db`, `config`, `PARTY_COLOURS`
 ## ⚠️ Critical Bug
 **Read `../anti-patterns/anti-patterns.yaml` first.** Section 1 documents a critical bug in
 `explorer_helpers.py:compute_party_coords` where party names in `svd_vectors` entity_id are
 not recognized because `party_map` only contains MP-name keys.
 ## Files Generated
 - `manifest.yaml` — lists all constraint files with group mappings
 - `stack/stack.yaml` — tech stack
 - `architecture/architecture.yaml` — data flow & components
 - `conventions/conventions.yaml` — coding conventions
 - `domain/domain-glossary.yaml` — domain terminology
 - `patterns/patterns.yaml` — 10 code patterns with examples
 - `anti-patterns/anti-patterns.yaml` — 7 anti-patterns including CRITICAL BUG
 - `dependencies/dependencies.yaml` — library wiring
 - `README.md` — this index
--- a/.mindmodel/constraints/error-handling.md
+++ b/.mindmodel/constraints/error-handling.md
@ -1,143 +0,0 @@
 ---
 title: Error Handling Patterns
 category: constraints
 severity: high
 ---
 # Error Handling Patterns
 ## Core Rules
 1. **Catch `Exception`, return safe fallbacks** (False/[]/None)
 2. **Log exceptions with traceback** using `_logger.exception()`
 3. **Never swallow exceptions silently** - always log or return sensible default
 4. **Avoid nested try/except blocks** - flatten exception handling
 ## Pattern: Try/Except Safe Fallback
 This is the dominant pattern in the codebase (219+ instances).
 ```python
 # Standard pattern from database.py, api_client.py, etc.
 try:
    result = risky_operation()
    return process(result)
 except Exception as exc:
    _logger.warning("Operation failed: %s", exc)
    return safe_fallback  # False, [], None, {}
 ```
 ### Examples from Codebase
 **database.py** - DuckDB operations:
 ```python
 def get_svd_vectors(self, window: str):
    try:
        conn = duckdb.connect(self.db_path, read_only=True)
        try:
            result = conn.execute(query, (window,)).fetchall()
            return self._parse_vectors(result)
        finally:
            conn.close()
    except Exception as exc:
        _logger.warning("Failed to get SVD vectors: %s", exc)
        return []
 ```
 **ai_provider.py** - HTTP retries:
 ```python
 try:
    resp = requests.post(url, json=json, headers=headers, timeout=10)
    resp.raise_for_status()
    return resp.json()
 except requests.ConnectionError as exc:
    if attempt == retries:
        raise ProviderError(f"Connection error: {exc}") from exc
    # ... retry logic
 ```
 ## Pattern: Optional Dependency Fallback
 Gracefully degrade when optional packages are unavailable.
 ```python
 # UMAP fallback in explorer_helpers.py
 try:
    import umap
    HAS_UMAP = True
 except ImportError:
    HAS_UMAP = False
    _logger.debug("UMAP not available, using SVD vectors directly")
 def project_to_2d(vectors):
    if HAS_UMAP:
        return umap.UMAP().fit_transform(vectors)
    return vectors[:, :2]  # Fallback: first 2 SVD dimensions
 ```
 ## Anti-Patterns
 ### 1. Bare except with pass (CRITICAL)
 **File**: `database.py`, line 47
 ```python
 # BAD - catches KeyboardInterrupt, SystemExit, MemoryError
 try:
    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
 except:  # bare except
    pass
 ```
 **Fix**: Catch specific exception or log and continue:
 ```python
 try:
    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
 except Exception as exc:
    _logger.debug("Sequence creation skipped (may already exist): %s", exc)
 ```
 ### 2. Nested Exception Handling
 **File**: `explorer.py`, lines 244-261
 ```python
 # BAD - opaque error paths
 try:
    result = compute_svd(motions)
 except Exception:
    try:
        result = fallback_compute(motions)
    except Exception:
        pass  # Both exceptions silently dropped
 ```
 **Fix**: Flatten and handle each case explicitly:
 ```python
 # GOOD - explicit handling
 try:
    result = compute_svd(motions)
 except Exception as exc:
    _logger.warning("SVD failed, trying fallback: %s", exc)
    try:
        result = fallback_compute(motions)
    except Exception as fallback_exc:
        _logger.error("Both SVD approaches failed: %s, %s", exc, fallback_exc)
        raise
 ```
 ## Rule Summary
 | Pattern | When to Use | Return Value |
 |---------|-------------|--------------|
 | Safe fallback | Best-effort operations | `[]`, `{}`, `False`, `None` |
 | Re-raise | Critical operations that must succeed | raise |
 | Log and continue | Optional steps in pipeline | (continue) |
 | Graceful degradation | Optional dependencies | Default behavior |
 ## When to Log vs Return
 | Scenario | Action |
 |----------|--------|
 | User action fails | Log warning, return safe default |
 | Internal error (corrupt data) | Log error, return safe default |
 | Transient failure (network) | Log warning, retry if appropriate |
 | Configuration error | Log error, raise with clear message |
--- a/.mindmodel/constraints/imports.yaml
+++ b/.mindmodel/constraints/imports.yaml
@ -1,205 +0,0 @@
 # Import Organization Constraints
 ## Standard Order
 Organize imports in three groups with blank lines between:
 ```python
 # 1. Standard library imports (alphabetical within group)
 import json
 import logging
 import os
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional, Tuple
 # 2. Third-party packages (alphabetical within group)
 import duckdb
 import requests
 from config import config
 # 3. Local application modules (can use relative imports)
 from database import db
 from summarizer import summarizer
 ```
 ## Alphabetical Ordering
 Within each group, sort imports alphabetically:
 ```python
 # GOOD - alphabetical
 import json
 import logging
 from datetime import datetime
 from typing import Dict, List, Optional
 # BAD - random order
 from typing import Optional
 import json
 from datetime import datetime
 import logging
 from typing import Dict, List
 ```
 ## Grouping Rules
 ### Standard Library
 - `json`, `logging`, `os`, `sys`, `time`
 - `datetime`, `timedelta` from `datetime`
 - `Dict`, `List`, `Optional`, etc. from `typing`
 - `argparse`, `pathlib`, `re`, `uuid`
 ### Third-Party
 - `duckdb`, `requests`, `streamlit`
 - `numpy`, `scipy`, `sklearn`
 - `plotly`, `beautifulsoup4`
 - `pytest`
 ### Local Application
 - Modules from same package
 - Relative imports when appropriate
 ## When to Use `from X import Y`
 ### Prefer `from module import specific_items` for:
 - Constants and config
 - Single classes or functions used frequently
 - Type annotations
 ```python
 # GOOD - clear about what we're using
 from config import config
 from database import db
 # GOOD - type hints
 from typing import Dict, List, Optional
 ```
 ### Use `import module` when:
 - You need multiple items from the module
 - Using module.namespace is clearer
 ```python
 # GOOD - duckdb used for types and module access
 import duckdb
 conn = duckdb.connect(...)
 result = conn.execute(...)
 # Also acceptable for types
 from typing import Dict
 ```
 ## Relative Imports
 In package modules, prefer relative imports:
 ```python
 # pipeline/svd_pipeline.py
 from ..database import MotionDatabase  # relative import
 from .text_pipeline import process_text  # relative import
 ```
 ## Circular Imports
 Avoid circular imports by:
 1. Moving shared code to a third module
 2. Using TYPE_CHECKING for type hints only
 ```python
 # types.py - shared type definitions
 from typing import TypedDict
 class MotionDict(TypedDict):
    id: int
    title: str
    ...
 # module_a.py
 from .types import MotionDict
 # module_b.py - if needed here too
 from .types import MotionDict
 ```
 ## Import Patterns to Avoid
 ### Wildcard Imports
 ```python
 # BAD
 from database import *
 # GOOD
 from database import db, MotionDatabase
 ```
 ### Import in Function Scope (unless necessary)
 ```python
 # AVOID - delays import, makes dependencies unclear
 def some_function():
    import pandas as pd  # Late import
    return pd.DataFrame(...)
 # PREFER - import at module level
 import pandas as pd
 def some_function():
    return pd.DataFrame(...)
 ```
 ### Reassigning Imported Names
 ```python
 # BAD - confusing
 from module import process
 process = something_else  # Reassigning
 # GOOD - clear naming
 from module import process as process_data
 ```
 ## Type Checking Imports
 For type hints only, use TYPE_CHECKING:
 ```python
 from typing import TYPE_CHECKING
 if TYPE_CHECKING:
    from .models import Motion
 def get_motion(motion_id: int) -> "Motion":  # String quote for forward ref
    ...
 ```
 ## Optional Dependency Imports
 Handle optional dependencies gracefully:
 ```python
 try:
    import duckdb
 except Exception:
    duckdb = None  # Will be checked later
 class MotionDatabase:
    def __init__(self):
        if duckdb is None:
            self._file_mode = True  # Fallback mode
 ```
 ## Example: Complete Import Block
 ```python
 # Complete example from database.py
 import json
 import logging
 import uuid
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional, Tuple
 import duckdb
 from config import config
 from database import db
 ```
--- a/.mindmodel/constraints/logging.md
+++ b/.mindmodel/constraints/logging.md
@ -1,131 +0,0 @@
 ---
 title: Logging Constraints
 category: constraints
 severity: critical
 ---
 # Logging Constraints
 ## Core Rule
 Use `logging.getLogger(__name__)` - never use `print()`
 **CRITICAL ANTI-PATTERN**: `api_client.py` uses `print()` instead of logging (11 instances).
 ## CRITICAL Anti-Pattern: print() Instead of Logging
 **File**: `api_client.py`
 **Evidence**: Lines with `print(f"...")` instead of `_logger.info(...)`
 **Broken code**:
 ```python
 def get_motions(self, ...):
    try:
        # ...
        print(f"Fetched {len(voting_records)} voting records from API")  # BAD
        print(f"Processed into {len(motions)} unique motions")  # BAD
    except Exception as e:
        print(f"Error fetching motions from API: {e}")  # BAD - no traceback
 ```
 **Fix**:
 ```python
 import logging
 _logger = logging.getLogger(__name__)
 def get_motions(self, ...):
    try:
        _logger.info("Fetched %d voting records from API", len(voting_records))
        _logger.info("Processed into %d unique motions", len(motions))
    except Exception as e:
        _logger.exception("Error fetching motions from API: %s", e)
        return []
 ```
 ## Logger Initialization
 Get logger at module level:
 ```python
 # GOOD: Use logging.getLogger(__name__)
 import logging
 _logger = logging.getLogger(__name__)
 def some_function():
    _logger.info("Processing started")
    _logger.debug("Detail: %s", detail)
 ```
 ## Logger Naming
 Use `__name__` for automatic module path:
 ```python
 # In database.py - logger will be "database"
 _logger = logging.getLogger(__name__)
 # In pipeline/svd_pipeline.py - logger will be "pipeline.svd_pipeline"
 _logger = logging.getLogger(__name__)
 ```
 **INCONSISTENCY WARNING**: 16 files use `logger`, 17 files use `_logger`. Choose one convention.
 **Recommendation**: Use `_logger` (with underscore) for module-level loggers to distinguish from class-level loggers.
 ## Log Levels
 | Level | When to Use |
 |-------|-------------|
 | DEBUG | Detailed diagnostic info (dev only) |
 | INFO | Normal operation milestones |
 | WARNING | Unexpected but handled (fallbacks) |
 | ERROR | Operation failed, may need attention |
 | CRITICAL | Fatal error, program may crash |
 ## Exception Logging
 Use `_logger.exception()` for caught exceptions (includes traceback):
 ```python
 try:
    result = risky_operation()
 except Exception as exc:
    _logger.exception("Operation failed: %s", exc)
    return fallback_value
 ```
 ## Anti-Patterns
 ### Debug Prints in Production Code
 ```python
 # BAD
 print(f"[TRAJ DEBUG] processing window {wid}")
 # GOOD
 _logger.debug("Processing window %s", wid)
 ```
 ### Inconsistent Logger Names
 ```python
 # BAD - mixing _logger and logger
 _logger = logging.getLogger(__name__)
 logger = logging.getLogger("other")  # Inconsistent
 ```
 ## Sensitive Data
 Never log sensitive information:
 - API keys
 - User votes
 - Session IDs (if tied to user data)
 - Personal information
 ```python
 # BAD
 _logger.info("User %s voted %s", user_id, vote)
 # GOOD - log aggregates, not individual votes
 _logger.info("Vote recorded for session %s", session_id[:8])
 ```
--- a/.mindmodel/constraints/naming.yaml
+++ b/.mindmodel/constraints/naming.yaml
@ -1,141 +0,0 @@
 # Naming Constraints
 ## File Names
 ### Python Modules
 - **Convention**: `snake_case.py`
 - **Examples**: `motion_database.py`, `api_client.py`, `text_pipeline.py`
 ### Test Files
 - **Convention**: `test_<module_name>.py`
 - **Examples**: `test_database.py`, `test_api_client.py`
 ### Config Files
 - **Convention**: `snake_case`
 - **Examples**: `config.py`, `.env.example`, `pyproject.toml`
 ### Directories
 - **Convention**: `snake_case/`
 - **Examples**: `pipeline/`, `tests/integration/`, `src/validators/`
 ## Class Names
 - **Convention**: `PascalCase`
 - **Examples**: `MotionDatabase`, `TweedeKamerAPI`, `MotionSummarizer`
 ### Naming Patterns
 | Pattern | Example |
 |---------|---------|
 | Database wrapper | `MotionDatabase` |
 | API client | `TweedeKamerAPI` |
 | Service/Helpers | `MotionScraper`, `MotionAnalyzer` |
 | Exceptions | `ProviderError` |
 ## Function Names
 - **Convention**: `snake_case`
 - **Examples**: `get_motions`, `compute_similarity`, `process_voting_records`
 ### Private Methods
 - **Convention**: `_snake_case` (single underscore prefix)
 - **Examples**: `_get_voting_records`, `_parse_response`
 ## Variable Names
 ### Regular Variables
 - **Convention**: `snake_case`
 - **Examples**: `motion_id`, `party_name`, `voting_results`
 ### Constants (Module-Level)
 - **Convention**: `UPPER_SNAKE_CASE`
 - **Examples**: `DATABASE_PATH`, `API_TIMEOUT`, `MAX_RETRIES`
 ### Config Variables (in dataclass)
 - **Convention**: `UPPER_SNAKE_CASE`
 - **Examples**: `QWEN_MODEL`, `POLICY_AREAS`
 ### Booleans
 - **Convention**: `is_`, `has_`, `can_` prefixes or `_flag` suffix
 - **Examples**: `is_active`, `has_votes`, `skip_extract`
 ### Private Variables
 - **Convention**: `_underscore_prefix`
 - **Examples**: `_conn`, `_cache`, `_session`
 ## Singleton Instances
 - **Convention**: `lower_snake_case` at module level
 - **Examples**: `db = MotionDatabase()`, `summarizer = MotionSummarizer()`
 ```python
 # database.py
 class MotionDatabase:
    ...
 # Singleton instance
 db = MotionDatabase()
 # Usage
 from database import db
 motions = db.get_motions()
 ```
 ## Type Variables
 - **Convention**: `PascalCase`
 - **Examples**: `T = TypeVar('T')`, `MotionDict = Dict[str, Any]`
 ## Anti-Patterns
 ### Inconsistent Naming
 ```python
 # BAD - mixing styles
 get_motions()      # snake_case
 GetMotionById()    # PascalCase
 processData()      # camelCase
 # GOOD - consistent snake_case
 get_motions()
 get_motion_by_id()
 process_voting_data()
 ```
 ### Abbreviations
 ```python
 # AVOID - unclear abbreviations
 calc_similarity()      # calculate_*
 proc_votes()          # process_*
 get_mp_data()          # get_mp_metadata()
 # PREFER - full words
 calculate_similarity()
 process_votes()
 get_mp_metadata()
 ```
 ### Hungarian Notation
 ```python
 # BAD - Hungarian notation
 str_title = "..."
 int_count = 0
 b_is_active = True
 # GOOD - clear types via naming
 title = "..."
 count = 0
 is_active = True
 ```
 ## Special Cases
 ### Window IDs
 - **Format**: `"YYYY-QN"` or `"YYYY"`
 - **Examples**: `"2024-Q1"`, `"2024-Q2"`, `"2024"`
 ### Policy Areas
 - **Convention**: PascalCase with spaces
 - **Examples**: `"Economie"`, `"Sociale Zaken"`, `"Klimaat"`
 ### Vote Values
 - **Convention**: PascalCase Dutch terms
 - **Values**: `"Voor"`, `"Tegen"`, `"Onthouden"`, `"Geen stem"`, `"Afwezig"`
--- a/.mindmodel/constraints/testing.yaml
+++ b/.mindmodel/constraints/testing.yaml
@ -1,26 +0,0 @@
 # Testing conventions constraint (YAML)
 rules:
  - name: test_naming
    rule: "Use pytest and name tests test_*.py and test_* functions."
    examples:
      - good: "tests/test_text_pipeline.py"
      - bad: "tests/text_pipeline_test.py"
  - name: fixtures_and_conftest
    rule: "Place shared fixtures in tests/conftest.py or tests/fixtures/ for reuse."
    examples:
      - good: "use fixtures declared in tests/conftest.py"
  - name: assert_raises
    rule: "Explicitly assert expected exceptions with pytest.raises for invalid input."
    examples:
      - good: |
          import pytest
          def test_invalid_input():
              with pytest.raises(ValueError):
                  function_under_test('bad')
 enforcement_examples:
  - "Run pytest in CI; fail if tests don't run or if there are regressions."
--- a/.mindmodel/constraints/types.yaml
+++ b/.mindmodel/constraints/types.yaml
@ -1,233 +0,0 @@
 # Type Hint Constraints
 ## Core Rule
 **Use type hints on all public functions and methods**
 ## Function Type Hints
 ### Required on Public APIs
 ```python
 # GOOD - complete type hints
 def get_motion(self, motion_id: int) -> Optional[Dict]:
    ...
 def get_filtered_motions(
    self,
    policy_area: str = "Alle",
    limit: int = 10
 ) -> List[Dict]:
    ...
 def calculate_similarity(self, motion_a: int, motion_b: int) -> float:
    ...
 ```
 ### Optional Parameters
 Use `Optional[X]` or `X | None`:
 ```python
 # Both forms are acceptable
 def get_motion(self, motion_id: Optional[int] = None) -> Optional[Dict]:
    ...
 def get_motion(self, motion_id: int | None = None) -> dict | None:
    ...
 ```
 ### Multiple Return Types
 Use `Union[X, Y]` or `|` operator:
 ```python
 # Acceptable forms
 def parse_value(self, value: str) -> Union[bool, str, None]:
    ...
 def parse_value(self, value: str) -> bool | str | None:
    ...
 ```
 ### Generic Types
 Use `List[X]`, `Dict[K, V]`, `Tuple[X, Y]`:
 ```python
 from typing import Dict, List, Optional, Tuple
 def get_motions(self, ids: List[int]) -> Dict[int, Dict]:
    """Map motion_id -> motion data."""
    ...
 def process_batch(self, items: List[str]) -> Tuple[List[str], List[str]]:
    """Returns (successes, failures)."""
    ...
 ```
 ## Collection Types
 Prefer specific types over bare `list`/`dict`:
 ```python
 # GOOD - specific types
 def get_votes(self) -> List[str]:
    ...
 def get_metadata(self) -> Dict[str, Any]:
    ...
 # ACCEPTABLE - for truly generic collections
 def merge_dicts(*dicts: dict) -> dict:
    ...
 ```
 ## DuckDB Result Types
 DuckDB returns tuples/lists - document expected structure:
 ```python
 def get_motion(self, motion_id: int) -> Optional[Tuple]:
    """Returns (id, title, description, date, ...) or None."""
    conn = duckdb.connect(self.db_path)
    try:
        result = conn.execute(
            "SELECT * FROM motions WHERE id = ?", (motion_id,)
        ).fetchone()
        return result
    finally:
        conn.close()
 # Or use Dict for clarity
 def get_motion_as_dict(self, motion_id: int) -> Optional[Dict]:
    """Returns motion dict or None."""
    conn = duckdb.connect(self.db_path)
    try:
        row = conn.execute(
            "SELECT * FROM motions WHERE id = ?", (motion_id,)
        ).fetchone()
        if row:
            return {
                "id": row[0],
                "title": row[1],
                "description": row[2],
                ...
            }
        return None
    finally:
        conn.close()
 ```
 ## Class/Instance Types
 Use `Self` for methods returning instance type:
 ```python
 from typing import Self
 class MotionDatabase:
    def with_connection(self, path: str) -> Self:
        """Return new instance with different path."""
        return MotionDatabase(db_path=path)
 ```
 ## Callback/Function Types
 Use `Callable` for function parameters:
 ```python
 from typing import Callable
 def process_motions(
    motions: List[Dict],
    processor: Callable[[Dict], Any]
 ) -> List[Any]:
    return [processor(m) for m in motions]
 ```
 ## Type Aliases
 Define clear type aliases for domain concepts:
 ```python
 from typing import Dict, List, TypedDict, Literal
 # Vote values
 VoteValue = Literal["Voor", "Tegen", "Onthouden", "Geen stem", "Afwezig"]
 # Policy areas
 PolicyArea = Literal["Alle", "Economie", "Klimaat", "Immigratie", ...]
 # Motion dict
 class MotionDict(TypedDict):
    id: int
    title: str
    description: Optional[str]
    date: Optional[str]
    policy_area: Optional[str]
    voting_results: Optional[str]  # JSON string
    winning_margin: Optional[float]
 def get_motion(self, motion_id: int) -> Optional[MotionDict]:
    ...
 ```
 ## Avoid `Any`
 Use `Any` sparingly - prefer specific types:
 ```python
 # AVOID - too vague
 def process(data: Any) -> Any:
    ...
 # PREFER - specific types
 def process(motion: MotionDict) -> Optional[SimilarityResult]:
    ...
 ```
 ## Inline Type Hints
 For simple cases, inline hints are fine:
 ```python
 def get_count(self) -> int:
    ...
 def is_empty(self) -> bool:
    ...
 ```
 ## Docstring Type Hints
 For complex types, include in docstrings:
 ```python
 def get_party_positions(self, window_id: str) -> Dict[str, List[float]]:
    """Get party positions in political space.
    Args:
        window_id: Time window (e.g., "2024-Q1")
    Returns:
        Dict mapping party_name -> [x, y] coordinates
    Example:
        >>> positions = db.get_party_positions("2024-Q1")
        >>> positions["VVD"]
        [0.5, -0.3]
    """
    ...
 ```
 ## Type Checking
 For runtime type checking, use runtime checks:
 ```python
 def set_count(self, count: int) -> None:
    if not isinstance(count, int):
        raise TypeError(f"Expected int, got {type(count).__name__}")
    self._count = count
 ```
--- a/.mindmodel/conventions/conventions.yaml
+++ b/.mindmodel/conventions/conventions.yaml
@ -1,124 +0,0 @@
 # Naming Conventions
 ## Files
 - **snake_case** for all Python files: `database.py`, `explorer_helpers.py`, `motion_cache.py`
 - **PascalCase** NOT used for files
 ## Functions
 - **snake_case**: `get_svd_vectors()`, `compute_party_coords()`, `build_scatter_trace()`
 - Private helpers prefixed with `_`: `_get_window_data()`
 ## Classes
 - **PascalCase**: `MotionDatabase`, `Config`
 - **Dataclass pattern** for Config: `@dataclass` decorator with typed fields
 ## Variables
 - **snake_case**: `party_map`, `mp_name`, `svd_vectors`, `party_centroids`
 - **CONSTANT_SNAKE_CASE** for module-level constants: `PARTY_COLOURS`, `DEFAULT_WINDOW`
 ## Module-Level Exports
 - **Singleton instance**: `db = MotionDatabase()` at module bottom (not class-level)
 - **Config instance**: `config = Config(...)` at module bottom
 - **Dicts**: `PARTY_COLOURS` exported from `config.py`
 ---
 # Error Handling
 ## Known Patterns
 1. **Bare except with pass** (ANTI-PATTERN - see anti-patterns.yaml)
   ```python
   except:
       pass  # database.py:47
   ```
 2. **Graceful degradation**: catch specific exceptions, fall back to default
   ```python
   try:
       result = compute_svd()
   except ImportError:
       result = DEFAULT_SVD
   ```
 3. **Optional dependency fallbacks**:
   ```python
   try:
       import umap
       use_umap = True
   except ImportError:
       use_umap = False
   ```
 4. **Nested exception handling** (ANTI-PATTERN - see anti-patterns.yaml):
   ```python
   try:
       ...
   except Exception:
       try:
           ...
       except Exception:
           pass
   ```
 ## Rules
 - Never use bare `except:` — always specify exception type
 - Never swallow exceptions silently — log or return a sensible default
 - For optional deps, use `ImportError` or `ModuleNotFoundError` explicitly
 - Avoid nested try/except blocks
 ---
 # Code Organization
 ## Singleton Pattern
 Each module owns one shared instance:
 ```python
 # database.py
 db = MotionDatabase()
 # config.py
 config = Config(...)
 PARTY_COLOURS = {...}
 ```
 ## Pure Functions in Helpers
 `explorer_helpers.py` contains only pure functions (no IO, no Streamlit calls):
 ```python
 def compute_party_coords(svd_vectors, party_map):
    """Pure: no side effects, no imports from this module"""
    ...
 def build_scatter_trace(df, color_col):
    """Pure: returns Plotly trace dict"""
    ...
 ```
 ## Cached Data Loaders
 Use `@st.cache_data` for expensive data loading:
 ```python
@st.cache_data
 def load_svd_vectors(window: str) -> pd.DataFrame:
    return db.get_svd_vectors(window)
 ```
 ## Dataclass Config
 ```python
@dataclass
 class Config:
    db_path: str = "data/stemwijzer.duckdb"
    default_window: str = "2023"
    party_colours: dict = field(default_factory=lambda: PARTY_COLOURS)
 ```
 ---
 # Imports
 ## Ordering (convention)
 1. Standard library
 2. Third-party (streamlit, ibis, plotly, sklearn, umap)
 3. Local/relative imports
 ## Avoid
 - Wildcard imports (`from module import *`)
 - Circular imports (ensure dependency direction: helpers → database → config)
--- a/.mindmodel/dependencies/dependencies.md
+++ b/.mindmodel/dependencies/dependencies.md
@ -1,92 +0,0 @@
 ---
 title: Dependencies and Library Usage
 category: dependencies
 ---
 # Dependencies and Library Usage
 ## Core Dependencies
 ### duckdb
 - **Required**: Yes
 - **Fallback**: None (core functionality)
 - **Usage**: SQL database for motions, embeddings, SVD vectors
 - **Files**: database.py, analysis/*.py, pipeline/*.py
 ### streamlit
 - **Required**: Yes
 - **Fallback**: None
 - **Usage**: Web UI framework
 - **Files**: app.py, pages/*.py, explorer.py
 ### requests
 - **Required**: Yes
 - **Fallback**: None
 - **Usage**: HTTP client for API calls
 - **Files**: api_client.py, ai_provider.py
 ### plotly
 - **Required**: Yes
 - **Fallback**: None (raises ImportError)
 - **Usage**: Interactive charts for explorer
 - **Files**: explorer.py, explorer_helpers.py
 ## Optional Dependencies
 ### umap-learn
 - **Required**: No
 - **Fallback**: Use raw SVD vectors (first 2 dimensions)
 - **Usage**: Dimensionality reduction for visualization
 - **Files**: analysis/clustering.py
 ### matplotlib
 - **Required**: No
 - **Fallback**: Plotly or raw output
 - **Usage**: Static charting
 - **Files**: Various analysis scripts
 ## ML Dependencies
 ### sklearn
 - **Required**: Yes
 - **Usage**: KMeans clustering, cosine_similarity, StandardScaler
 - **Files**: analysis/clustering.py, similarity/compute.py
 ### scipy
 - **Required**: Yes
 - **Usage**: SVD (scipy.linalg.svd), spatial.procrustes for alignment
 - **Files**: analysis/trajectory.py, pipeline/svd_pipeline.py
 ### numpy
 - **Required**: Yes
 - **Usage**: Array operations, linear algebra
 - **Files**: Throughout codebase
 ## Key Imports by File
 ### explorer.py
 - `import streamlit as st`
 - `from database import db`
 - `from explorer_helpers import *`
 ### explorer_helpers.py
 - `import pandas as pd`
 - `import plotly.graph_objects as go`
 - `from database import db` (optional, for type hints)
 ### database.py
 - `import ibis`
 - `import duckdb`
 - `from config import config, PARTY_COLOURS`
 ### config.py
 - `from dataclasses import dataclass, field`
 - `import streamlit as st` (optional, for warnings)
 ## Singleton Instances
 | Module | Instance | Type |
 |--------|----------|------|
 | `database.py` | `db` | `MotionDatabase` |
 | `config.py` | `config` | `Config` (dataclass) |
 | `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
--- a/.mindmodel/domain/domain-glossary.md
+++ b/.mindmodel/domain/domain-glossary.md
@ -1,146 +0,0 @@
 ---
 title: Domain Glossary
 category: domain
 ---
 # Domain Glossary - Dutch Political Terms
 ## CRITICAL INVARIANTS
 > **Rule 1**: Centroid of right-wing parties on RIGHT side of ALL axes
 > - PVV, FVD, JA21, SGP centroid must appear on the RIGHT
 > - Individual right-wing parties may vary slightly from the centroid
 > - This is non-negotiable for any compass/axis visualization
 > **Rule 2**: SVD labels are empirically derived from voting data
 > - Labels represent WHAT THE DATA SHOWS, not party self-identification or public opinion
 > - Labels are derived from outliers and 20 representative motions (10 positive, 10 negative)
 > - See SVD Label Derivation section below
 ---
 ## SVD Label Derivation
 ### The Process
 SVD (Singular Value Decomposition) finds axes that maximize variance in the MP × Motion voting matrix. To label each axis:
 1. **Identify outliers**: Find the two MPs with most extreme positions on that axis
 2. **Select representative motions**: Pick 20 motions where these outliers disagreed most sharply (10 they voted opposite on, 10 where both voted same direction but with other extremes)
 3. **Interpret theme**: Read the motion titles to derive what the axis represents
 4. **Assign label**: Label describes the empirical theme, could be:
   - Left-Right
   - Coalition-Opposition
   - Progressive-Conservative
   - EU-National sovereignty
   - Populist-Establishment
   - Or whatever the voting patterns show
 ### Example
 | Step | Description |
 |------|-------------|
 | Outlier A | Wilders (PVV) - extreme positive on Dim 1 |
 | Outlier B | Marijnissen (SP) - extreme negative on Dim 1 |
 | 20 Motions | Immigration, integration, law & order themes dominate |
 | Label | "Links-Rechts" (Left-Right) |
 ### Labeling Rules
 - **Never use party names in labels** (e.g., not "PVV-SP axis")
 - **Never use semantic/ideological labels** (e.g., not "progressive-conservative" unless that's what the motions show)
 - **Use motion-derived themes** (e.g., "Immigration", "EU", "Economy")
 - **Fallback**: If theme is unclear, use "Axis 1", "Axis 2"
 ---
 ## Core Entities
 ### Motion / Motie
 - Parliamentary motion submitted by MPs
 - Fields: `id`, `title`, `date`, `category`
 - MPs vote: **For** (+1), **Against** (-1), **Abstain** (0), **Absent**
 ### MP / Kamerlid
 - Member of Parliament (Tweede Kamerlid)
 - Identified by full name (e.g., "Van Dijk, I.")
 - Has voting record, party affiliation, SVD position vector
 ### Party / Fractie
 - Political party (e.g., "GroenLinks-PvdA", "PVV", "VVD")
 - Party centroids: average SVD position of all MPs in party
 ### Vote / Stemming
 - Individual MP's vote on a motion: +1, 0, -1
 - Aggregated to compute SVD vectors
 ---
 ## Time & Analysis Concepts
 ### Window / Tijdsvenster
 - Time period for analysis (annual or quarterly)
 - Values: "2023", "2023-Q1", "2024", etc.
 - SVD vectors computed per window
 ### Trajectory
 - MP's position change across multiple windows
 - Computed from `svd_vectors` + window ordering
 ---
 ## Mathematical / Algorithmic Terms
 ### SVD Vector
 - 2D vector from Singular Value Decomposition of MP × Motion vote matrix
 - Represents MP's position in political space
 ### SVD Label
 - Empirically derived axis label based on outlier MPs and representative motions
 - Describes the theme of disagreement on that axis
 - NOT based on party ideology or semantic labels
 ### Political Compass
 - 2D visualization with SVD axes mapped to compass quadrants
 - X-axis: First SVD dimension (labeled from voting data)
 - Y-axis: Second SVD dimension (labeled from voting data)
 ### Procrustes Alignment
 - Algorithm to align SVD vectors across time windows
 - Ensures comparable positions across years/quarters
 ### UMAP
 - Uniform Manifold Approximation and Projection
 - Dimensionality reduction for visualization
 - Optional dependency with graceful SVD fallback
 ---
 ## Database Table Reference
 | Table | Key Fields |
 |-------|-----------|
 | `motions` | id, title, date, category |
 | `mp_votes` | mp_id, motion_id, vote |
 | `svd_vectors` | entity_id, window, vector_2d (list[2]) |
 | `mp_party_history` | mp_id, party, start_date, end_date |
 | `windows` | window_id, start_date, end_date, period_type |
 | `mp_trajectories` | mp_id, window, trajectory_vector |
 ---
 ## Dutch Political Parties
 ### Canonical Right-Wing (centroid on RIGHT of axes)
 - PVV (Partij voor de Vrijheid)
 - FVD (Forum voor Democratie)
 - JA21
 - SGP (Staatkundig Gereformeerde Partij)
 ### Other Major Parties
 - VVD (Volkspartij voor Vrijheid en Democratie)
 - GL-PvdA (GroenLinks-PvdA)
 - NSC (Nieuw Sociaal Contract)
 - BBB (BoerBurgerBeweging)
 - SP (Socialistische Partij)
 - D66 (Democraten 66)
--- a/.mindmodel/examples/api-client-example.py
+++ b/.mindmodel/examples/api-client-example.py
@ -1,196 +0,0 @@
 """Example: TweedeKamerAPI usage - from api_client.py and actual codebase."""
 from datetime import datetime, timedelta
 from typing import Dict, List
 # Import the API client
 from api_client import TweedeKamerAPI
 # =============================================================================
 # Example 1: Basic API usage
 # =============================================================================
 def example_fetch_motions():
    """Fetch recent parliamentary motions from TweedeKamer API."""
    api = TweedeKamerAPI()
    # Fetch motions from last 30 days
    start_date = datetime.now() - timedelta(days=30)
    try:
        motions = api.get_motions(start_date=start_date, limit=100)
        print(f"Fetched {len(motions)} motions")
        for motion in motions[:5]:  # Show first 5
            print(f"  - {motion.get('title', 'N/A')}")
        return motions
    finally:
        api.close()
 # =============================================================================
 # Example 2: Fetching with date range
 # =============================================================================
 def example_date_range():
    """Fetch motions from a specific date range."""
    api = TweedeKamerAPI()
    start = datetime(2024, 1, 1)
    end = datetime(2024, 3, 31)  # Q1 2024
    try:
        motions = api.get_motions(start_date=start, end_date=end, limit=500)
        # Group by policy area
        by_area = {}
        for m in motions:
            area = m.get("policy_area", "Onbekend")
            by_area.setdefault(area, []).append(m)
        for area, area_motions in sorted(by_area.items()):
            print(f"{area}: {len(area_motions)} motions")
        return motions
    finally:
        api.close()
 # =============================================================================
 # Example 3: Context manager usage
 # =============================================================================
 def example_context_manager():
    """Use API client as context manager."""
    with TweedeKamerAPI() as api:
        motions = api.get_motions(
            start_date=datetime.now() - timedelta(days=7), limit=50
        )
        print(f"Fetched {len(motions)} motions this week")
        return motions
 # =============================================================================
 # Example 4: Processing voting records
 # =============================================================================
 def example_process_votes():
    """Process individual voting records from API."""
    api = TweedeKamerAPI()
    start_date = datetime.now() - timedelta(days=7)
    try:
        # Get voting records directly
        voting_records, besluit_meta = api._get_voting_records(
            start_date=start_date, limit=1000
        )
        print(f"Fetched {len(voting_records)} voting records")
        print(f"From {len(besluit_meta)} unique decisions")
        # Count votes by party
        party_votes = {}
        for record in voting_records:
            party = record.get("Fractie", "Onbekend")
            vote = record.get("Soort", "Onbekend")
            party_votes.setdefault(party, {})[vote] = (
                party_votes.get(party, {}).get(vote, 0) + 1
            )
        for party, votes in sorted(party_votes.items()):
            total = sum(votes.values())
            voor = votes.get("Voor", 0)
            print(f"{party}: {total} votes ({voor} voor)")
        return voting_records
    finally:
        api.close()
 # =============================================================================
 # Example 5: Safe API call with fallback
 # =============================================================================
 def example_safe_call():
    """Make API call with safe fallback on failure."""
    api = TweedeKamerAPI()
    try:
        # This will return [] on any error
        motions = api.get_motions(
            start_date=datetime.now() - timedelta(days=30), limit=100
        )
        if not motions:
            print("No motions returned - using cached data")
            # Fallback to cached/local data
            from database import db
            return db.get_filtered_motions(limit=10)
        return motions
    finally:
        api.close()
 # =============================================================================
 # Example 6: Pagination handling
 # =============================================================================
 def example_pagination():
    """Understand how pagination works in the API."""
    api = TweedeKamerAPI()
    start_date = datetime.now() - timedelta(days=365)
    # Simulate pagination
    page_size = 250
    total_limit = 500
    all_motions = []
    skip = 0
    while len(all_motions) < total_limit:
        print(f"Fetching page with skip={skip}...")
        # In real usage, get_motions handles pagination internally
        # This demonstrates what's happening under the hood
        page_motions = api._fetch_page(start_date=start_date, skip=skip, top=page_size)
        if not page_motions:
            break
        all_motions.extend(page_motions)
        skip += page_size
        if len(page_motions) < page_size:
            break  # Last page
    print(f"Total fetched: {len(all_motions)} motions")
    return all_motions
 if __name__ == "__main__":
    print("=== Basic Fetch ===")
    example_fetch_motions()
    print("\n=== Process Votes ===")
    example_process_votes()
--- a/.mindmodel/examples/database-example.py
+++ b/.mindmodel/examples/database-example.py
@ -1,191 +0,0 @@
 """Example: MotionDatabase usage - from database.py and actual codebase."""
 from typing import Dict, List, Optional
 import duckdb
 import json
 from config import config
 # Import the singleton instance
 from database import db
 # =============================================================================
 # Example 1: Getting filtered motions
 # =============================================================================
 def example_get_filtered_motions():
    """Get controversial motions from a specific policy area."""
    motions = db.get_filtered_motions(
        policy_area="Klimaat",
        min_margin=0.0,
        max_margin=0.3,  # Controversial: close margin
        limit=10,
    )
    for motion in motions:
        print(f"{motion['title']}: {motion['winning_margin']:.1%} margin")
    return motions
 # =============================================================================
 # Example 2: Creating a voting session
 # =============================================================================
 def example_voting_session():
    """Create a new user session and record votes."""
    # Create session for 10 motions
    session_id = db.create_session(total_motions=10)
    print(f"Created session: {session_id}")
    # Get motions for the session
    motions = db.get_filtered_motions(policy_area="Alle", limit=10)
    # Record votes
    for motion in motions:
        # In real app, user would choose vote
        vote = "Voor"  # Example vote
        db.record_vote(session_id=session_id, motion_id=motion["id"], vote=vote)
    # Get results
    results = db.get_party_results(session_id)
    for party, result in sorted(results.items(), key=lambda x: -x[1]["agreement"]):
        print(f"{party}: {result['agreement']:.1%} agreement")
    return results
 # =============================================================================
 # Example 3: Working with DuckDB connections directly
 # =============================================================================
 def example_direct_duckdb():
    """Example of proper DuckDB connection handling."""
    conn = duckdb.connect(config.DATABASE_PATH)
    try:
        # Get motion with votes
        result = conn.execute(
            """
            SELECT m.*, 
                   JSON_EXTRACT(voting_results, '$.total_votes') as total_votes
            FROM motions m
            WHERE m.id = ?
        """,
            (123,),
        ).fetchone()
        if result:
            print(f"Motion: {result[1]}")  # title is index 1
        return result
    finally:
        conn.close()
 # =============================================================================
 # Example 4: Bulk operations
 # =============================================================================
 def example_bulk_insert():
    """Example of bulk inserting motions."""
    # Sample data
    motions = [
        {
            "title": "Motion about climate policy",
            "description": "Proposal to reduce emissions",
            "date": "2024-01-15",
            "policy_area": "Klimaat",
            "voting_results": json.dumps({"Voor": 75, "Tegen": 65}),
            "winning_margin": 0.07,
            "controversy_score": 0.85,
        },
        {
            "title": "Motion about healthcare",
            "description": "Increase healthcare budget",
            "date": "2024-01-20",
            "policy_area": "Zorg",
            "voting_results": json.dumps({"Voor": 90, "Tegen": 50}),
            "winning_margin": 0.29,
            "controversy_score": 0.42,
        },
    ]
    conn = duckdb.connect(config.DATABASE_PATH)
    try:
        for motion in motions:
            conn.execute(
                """
                INSERT INTO motions 
                (title, description, date, policy_area, voting_results, 
                 winning_margin, controversy_score)
                VALUES (?, ?, ?, ?, ?, ?, ?)
            """,
                (
                    motion["title"],
                    motion["description"],
                    motion["date"],
                    motion["policy_area"],
                    motion["voting_results"],
                    motion["winning_margin"],
                    motion["controversy_score"],
                ),
            )
        conn.close()
        print(f"Inserted {len(motions)} motions")
    except Exception as e:
        conn.close()
        print(f"Error inserting motions: {e}")
 # =============================================================================
 # Example 5: Query with aggregation
 # =============================================================================
 def example_aggregation():
    """Example of aggregate queries."""
    conn = duckdb.connect(config.DATABASE_PATH)
    try:
        # Get statistics by policy area
        results = conn.execute("""
            SELECT 
                policy_area,
                COUNT(*) as motion_count,
                AVG(winning_margin) as avg_margin,
                AVG(controversy_score) as avg_controversy
            FROM motions
            WHERE policy_area IS NOT NULL
            GROUP BY policy_area
            ORDER BY motion_count DESC
        """).fetchall()
        for row in results:
            print(
                f"{row[0]}: {row[1]} motions, "
                f"avg margin {row[2]:.1%}, "
                f"controversy {row[3]:.2f}"
            )
        conn.close()
        return results
    except Exception as e:
        conn.close()
        return []
 if __name__ == "__main__":
    print("=== Filtered Motions ===")
    example_get_filtered_motions()
    print("\n=== Aggregation ===")
    example_aggregation()
--- a/.mindmodel/examples/pattern-examples.md
+++ b/.mindmodel/examples/pattern-examples.md
@ -1,116 +0,0 @@
 # Extracted pattern examples (representative snippets)
 Note: snippets are verbatim extracts from repository files (Phase 1). Paths shown.
 ## DuckDB connect + schema init (database.py)
 ```python
 conn = duckdb.connect(self.db_path)
 # Create sequence for auto-incrementing IDs
 try:
    conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
 except:
    pass
 # Create tables with proper ID handling
 conn.execute("""
    CREATE TABLE IF NOT EXISTS motions (
        id INTEGER DEFAULT nextval('motions_id_seq'),
        title TEXT NOT NULL,
        description TEXT,
        date DATE,
        policy_area TEXT,
        voting_results JSON,
        winning_margin FLOAT,
        controversy_score FLOAT,
        layman_explanation TEXT,
        externe_identifier TEXT,
        body_text TEXT,
        url TEXT UNIQUE,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (id)
    )
 """)
 conn.close()
 ```
 ## Read-only compute worker (svd_pipeline.py)
 ```python
 conn = duckdb.connect(db_path, read_only=True)
 try:
    rows = conn.execute(
        "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
        (start_date, end_date),
    ).fetchall()
 finally:
    conn.close()
 ```
 ## Requests with retry/backoff (ai_provider.py)
 ```python
 resp = requests.post(url, json=json, headers=headers, timeout=10)
 ...
 if getattr(resp, "status_code", 0) == 429:
    if attempt == retries:
        raise ProviderError(f"Provider returned HTTP {resp.status_code}")
    retry_after = None
    raw = resp.headers.get("Retry-After") if getattr(resp, "headers", None) else None
    if raw:
        try:
            retry_after = int(raw)
        except Exception:
            try:
                dt = parsedate_to_datetime(raw)
                now = datetime.now(tz=dt.tzinfo or timezone.utc)
                secs = (dt - now).total_seconds()
                retry_after = max(0, int(secs))
            except Exception:
                retry_after = None
    if retry_after is not None:
        time.sleep(retry_after)
        continue
 ```
 ## Embedding batch + per-item fallback (pipeline/ai_provider_wrapper.py)
 ```python
 for start in range(0, len(texts), batch_size):
    chunk = texts[i:end]
    emb_chunk, emb_exc = _attempt_batch(chunk, i)
    if emb_chunk is not None:
        for j, emb in enumerate(emb_chunk):
            results[i + j] = emb
        i = end
        continue
    # batch failed -> fallback to per-item attempts
    for j in range(i, end):
        t = texts[j]
        single, single_exc = _attempt_batch([t], j)
        if single:
            results[j] = single[0]
            continue
        results[j] = None
 ```
 ## Similarity compute (similarity/compute.py)
 ```python
 # Ensure consistent dimensionality: pad shorter vectors with zeros
 lengths = [len(v) for v in vecs]
 max_dim = max(lengths)
 if len(set(lengths)) != 1:
    logger.warning(
        "Inconsistent vector dimensions detected (max=%d). Padding shorter vectors with zeros.",
        max_dim,
    )
 matrix = np.zeros((len(vecs), max_dim), dtype=np.float32)
 for i, v in enumerate(vecs):
    matrix[i, : len(v)] = v
 # Normalize rows and compute cosine similarity
 norms = np.linalg.norm(matrix, axis=1, keepdims=True)
 norms[norms == 0] = 1.0
 normalized = matrix / norms
 sim = normalized @ normalized.T
 ```
--- a/.mindmodel/examples/pipeline-example.py
+++ b/.mindmodel/examples/pipeline-example.py
@ -1,217 +0,0 @@
 """Example: Pipeline phase execution - from pipeline/run_pipeline.py and actual codebase."""
 import argparse
 from datetime import date, timedelta
 from typing import List, Tuple
 # Import pipeline modules
 from pipeline.fetch_mp_metadata import fetch_mp_metadata
 from pipeline.extract_mp_votes import extract_mp_votes
 from pipeline.svd_pipeline import run_svd_pipeline
 from pipeline.text_pipeline import run_text_pipeline
 from pipeline.fusion import run_fusion
 from database import MotionDatabase
 # =============================================================================
 # Example 1: Running full pipeline
 # =============================================================================
 def example_full_pipeline():
    """Run the complete data ingestion pipeline."""
    # Parse arguments like CLI would
    parser = argparse.ArgumentParser(description="Pipeline runner")
    parser.add_argument("--db-path", default="data/motions.db")
    parser.add_argument("--start-date", default=None)
    parser.add_argument("--end-date", default=None)
    parser.add_argument(
        "--window-size", choices=["quarterly", "annual"], default="quarterly"
    )
    parser.add_argument("--svd-k", type=int, default=50)
    args = parser.parse_args([])
    # Resolve dates
    end_date = date.fromisoformat(args.end_date) if args.end_date else date.today()
    start_date = (
        date.fromisoformat(args.start_date)
        if args.start_date
        else end_date - timedelta(days=730)
    )
    print(f"Running pipeline: {start_date} → {end_date}")
    print(f"Window size: {args.window_size}")
    print(f"DB path: {args.db_path}")
    # Initialize database
    db = MotionDatabase(args.db_path)
    # Phase 1: Fetch MP metadata
    print("\n=== Phase 1: MP Metadata ===")
    n_mp = fetch_mp_metadata(db_path=args.db_path)
    print(f"Processed {n_mp} MPs")
    # Phase 2: Extract MP votes
    print("\n=== Phase 2: Extract Votes ===")
    n_votes = extract_mp_votes(db_path=args.db_path)
    print(f"Extracted {n_votes} vote records")
    # Phase 3: Generate time windows
    print("\n=== Phase 3: SVD Pipeline ===")
    windows = generate_windows(start_date, end_date, args.window_size)
    print(f"Generated {len(windows)} windows: {windows}")
    # Phase 4: SVD per window
    run_svd_pipeline(db, windows, args.svd_k)
    print(f"Computed SVD for {len(windows)} windows")
    # Phase 5: Text embeddings
    print("\n=== Phase 4: Text Embeddings ===")
    run_text_pipeline(args.db_path, batch_size=50)
    print("Text embeddings completed")
    # Phase 6: Fusion
    print("\n=== Phase 5: Fusion ===")
    run_fusion(args.db_path, windows)
    print("Fusion completed")
    print("\n=== Pipeline Complete ===")
 # =============================================================================
 # Example 2: Generate time windows
 # =============================================================================
 def generate_windows(
    start: date, end: date, granularity: str
 ) -> List[Tuple[str, str, str]]:
    """Generate time windows for pipeline processing."""
    windows = []
    cursor = date(start.year, start.month, 1)
    if granularity == "annual":
        cursor = date(start.year, 1, 1)
        while cursor <= end:
            year_end = date(cursor.year, 12, 31)
            w_end = min(year_end, end)
            windows.append((str(cursor.year), cursor.isoformat(), w_end.isoformat()))
            cursor = date(cursor.year + 1, 1, 1)
    else:
        # quarterly
        quarter_starts = {1: 1, 2: 4, 3: 7, 4: 10}
        quarter_ends = {1: 3, 2: 6, 3: 9, 4: 12}
        q = (cursor.month - 1) // 3 + 1
        cursor = date(cursor.year, quarter_starts[q], 1)
        while cursor <= end:
            q = (cursor.month - 1) // 3 + 1
            import calendar
            q_end_month = quarter_ends[q]
            last_day = calendar.monthrange(cursor.year, q_end_month)[1]
            q_end = date(cursor.year, q_end_month, last_day)
            w_end = min(q_end, end)
            window_id = f"{cursor.year}-Q{q}"
            windows.append((window_id, cursor.isoformat(), w_end.isoformat()))
            cursor = q_end + timedelta(days=1)
    return windows
 def example_window_generation():
    """Example of window generation."""
    start = date(2023, 1, 1)
    end = date(2024, 6, 30)
    print("Quarterly windows:")
    quarterly = generate_windows(start, end, "quarterly")
    for wid, s, e in quarterly:
        print(f"  {wid}: {s} to {e}")
    print("\nAnnual windows:")
    annual = generate_windows(start, end, "annual")
    for wid, s, e in annual:
        print(f"  {wid}: {s} to {e}")
 # =============================================================================
 # Example 3: Running individual phases
 # =============================================================================
 def example_individual_phases():
    """Run pipeline phases individually for debugging."""
    db_path = "data/motions.db"
    db = MotionDatabase(db_path)
    # Only run MP metadata fetch
    print("Fetching MP metadata...")
    n = fetch_mp_metadata(db_path=db_path)
    print(f"  {n} MPs processed")
    # Only run vote extraction
    print("Extracting votes...")
    n = extract_mp_votes(db_path=db_path)
    print(f"  {n} votes extracted")
    # Only run SVD for specific window
    print("Computing SVD...")
    windows = [("2024-Q1", "2024-01-01", "2024-03-31")]
    run_svd_pipeline(db, windows, k=50)
    print("  SVD computed")
    # Only run text embeddings
    print("Computing embeddings...")
    run_text_pipeline(db_path, batch_size=25)  # Smaller batch for testing
    print("  Embeddings computed")
 # =============================================================================
 # Example 4: Dry run
 # =============================================================================
 def example_dry_run():
    """Show what pipeline would do without making changes."""
    print("DRY RUN - no writes will be made")
    start_date = date(2024, 1, 1)
    end_date = date(2024, 6, 30)
    # Generate and show windows
    windows = generate_windows(start_date, end_date, "quarterly")
    print(f"Would process {len(windows)} windows:")
    for wid, s, e in windows:
        print(f"  {wid}: {s} to {e}")
    print("\nWould run phases:")
    print("  1. fetch_mp_metadata")
    print("  2. extract_mp_votes")
    print("  3. svd_pipeline")
    print("  4. text_pipeline")
    print("  5. fusion")
 if __name__ == "__main__":
    import logging
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
    )
    print("=== Window Generation ===")
    example_window_generation()
    print("\n=== Dry Run ===")
    example_dry_run()
--- a/.mindmodel/examples/streamlit-page-example.py
+++ b/.mindmodel/examples/streamlit-page-example.py
@ -1,316 +0,0 @@
 """Example: Streamlit page patterns - from actual pages/ files."""
 import streamlit as st
 # =============================================================================
 # Example 1: Home page (Home.py)
 # =============================================================================
 def render_home_page():
    """Simplified version of Home.py."""
    st.set_page_config(
        page_title="Motief: de stematlas",
        page_icon="🗺️",
        layout="centered",
        initial_sidebar_state="expanded",
    )
    st.title("🗺️ Motief: de stematlas")
    st.markdown(
        "**Motief** brengt de Nederlandse Tweede Kamer in kaart op basis van "
        "echte stemmingen over moties. Gebruik de Stemwijzer om te ontdekken welke "
        "partij het beste bij jouw standpunten past, of verken de politieke ruimte "
        "zelf in de Explorer."
    )
    st.divider()
    col1, col2 = st.columns(2)
    with col1:
        st.subheader("🗳️ Stemwijzer")
        st.markdown(
            "Stem op echte Tweede Kamer moties en zie welke partij het "
            "dichtst bij jouw keuzes staat."
        )
        st.page_link("pages/1_Stemwijzer.py", label="Open Stemwijzer", icon="🗳️")
    with col2:
        st.subheader("🔭 Politiek Explorer")
        st.markdown(
            "Verken het politieke kompas, partijtrajecten door de tijd, "
            "en zoek vergelijkbare moties op in het archief."
        )
        st.page_link("pages/2_Explorer.py", label="Open Explorer", icon="🔭")
    st.divider()
    st.caption("Data: Tweede Kamer API · Embeddings: QWEN (via OpenRouter)")
 # =============================================================================
 # Example 2: Thin page wrapper (pages/1_Stemwijzer.py)
 # =============================================================================
 def render_stemwijzer_page():
    """Pattern: thin page that delegates to module function."""
    st.set_page_config(
        page_title="Stemwijzer",
        page_icon="🗳️",
        layout="centered",
    )
    # Delegate to main module
    from explorer import build_mp_quiz_tab
    build_mp_quiz_tab("data/motions.db")
 # =============================================================================
 # Example 3: Session state initialization
 # =============================================================================
 def init_session_state():
    """Pattern: Initialize all session state at start."""
    defaults = {
        "session_id": None,
        "current_motion_index": 0,
        "motions": [],
        "show_results": False,
        "user_votes": {},
    }
    for key, default in defaults.items():
        if key not in st.session_state:
            st.session_state[key] = default
 # =============================================================================
 # Example 4: Sidebar configuration
 # =============================================================================
 def render_sidebar():
    """Pattern: Sidebar for configuration."""
    with st.sidebar:
        st.header("Instellingen")
        motion_count = st.slider(
            "Aantal moties",
            min_value=5,
            max_value=25,
            value=10,
            help="Hoeveel moties wilt u beantwoorden?",
        )
        policy_area = st.selectbox(
            "Beleidsgebied",
            [
                "Alle",
                "Economie",
                "Klimaat",
                "Immigratie",
                "Zorg",
                "Onderwijs",
                "Defensie",
                "Sociale Zaken",
                "Algemeen",
            ],
        )
        margin_range = st.slider(
            "Controversiële moties (%)",
            min_value=0,
            max_value=100,
            value=(0, 100),
            help="Filter op hoe omstreden de moties zijn",
        )
        st.divider()
        if st.button("Start Nieuwe Sessie", type="primary"):
            return {
                "motion_count": motion_count,
                "policy_area": policy_area,
                "margin_range": margin_range,
            }
    return None
 # =============================================================================
 # Example 5: Motion voting interface
 # =============================================================================
 def render_motion_vote(motion: dict, index: int, total: int):
    """Pattern: Display motion and voting buttons."""
    st.subheader(f"Motie {index + 1} van {total}")
    # Motion content
    st.markdown(f"### {motion['title']}")
    col1, col2 = st.columns([3, 1])
    with col1:
        if motion.get("layman_explanation"):
            st.info(motion["layman_explanation"])
        with st.expander("Meer details"):
            st.markdown(f"**Datum:** {motion.get('date', 'Onbekend')}")
            st.markdown(f"**Beleidsgebied:** {motion.get('policy_area', 'Onbekend')}")
            if motion.get("description"):
                st.markdown(f"**Beschrijving:** {motion['description']}")
    with col2:
        st.metric(
            label="Winstmarge",
            value=f"{motion.get('winning_margin', 0):.0%}",
            delta="Omstreden" if motion.get("controversy_score", 0) > 0.5 else "Helder",
        )
    st.divider()
    # Voting buttons
    col1, col2, col3 = st.columns(3)
    with col1:
        st.button(
            "👍 **Voor**",
            on_click=on_vote,
            args=(motion["id"], "Voor"),
            use_container_width=True,
        )
    with col2:
        st.button(
            "👎 **Tegen**",
            on_click=on_vote,
            args=(motion["id"], "Tegen"),
            use_container_width=True,
        )
    with col3:
        st.button(
            "🤔 **Onthouden**",
            on_click=on_vote,
            args=(motion["id"], "Onthouden"),
            use_container_width=True,
        )
 def on_vote(motion_id: int, vote: str):
    """Callback when user votes."""
    # Record vote
    from database import db
    db.record_vote(
        session_id=st.session_state.session_id, motion_id=motion_id, vote=vote
    )
    # Update session state
    st.session_state.user_votes[motion_id] = vote
    # Move to next or show results
    if st.session_state.current_motion_index < len(st.session_state.motions) - 1:
        st.session_state.current_motion_index += 1
    else:
        st.session_state.show_results = True
    st.rerun()
 # =============================================================================
 # Example 6: Results display
 # =============================================================================
 def render_results():
    """Pattern: Display voting results."""
    from database import db
    st.header("📊 Uw Resultaten")
    # Get party results
    results = db.get_party_results(st.session_state.session_id)
    if not results:
        st.warning("Geen resultaten beschikbaar")
        return
    # Sort by agreement
    sorted_results = sorted(
        results.items(), key=lambda x: x[1].get("agreement_percentage", 0), reverse=True
    )
    # Display top match
    if sorted_results:
        top_party, top_data = sorted_results[0]
        st.success(
            f"**Uw beste match:** {top_party} ({top_data.get('agreement_percentage', 0):.0%} overeenstemming)"
        )
    st.divider()
    # Show all parties
    for party, data in sorted_results:
        agreement = data.get("agreement_percentage", 0)
        col1, col2 = st.columns([3, 1])
        with col1:
            st.markdown(f"**{party}**")
            st.progress(agreement, text=f"{agreement:.0%}")
        with col2:
            st.metric("Overeenstemming", f"{agreement:.0%}")
    # Detailed breakdown
    with st.expander("Details per motie"):
        for motion in st.session_state.motions:
            user_vote = st.session_state.user_votes.get(motion["id"], "?")
            st.markdown(f"- **{motion['title']}**: U={user_vote}")
 # =============================================================================
 # Example 7: Tabs layout
 # =============================================================================
 def render_tabs_example():
    """Pattern: Use tabs for organizing content."""
    tab1, tab2, tab3 = st.tabs(["Compass", "Trajectories", "Zoeken"])
    with tab1:
        st.subheader("Politiek Kompas")
        st.write("Visualiseer partijposities in 2D ruimte")
        # Add compass chart...
    with tab2:
        st.subheader("Partij Trajectories")
        st.write("Bekijk hoe partijen door de tijd bewegen")
        # Add trajectory chart...
    with tab3:
        st.subheader("Zoek Moties")
        query = st.text_input("Zoekterm")
        if query:
            # Search functionality...
            st.write(f"Zoeken naar: {query}")
 if __name__ == "__main__":
    # Demo rendering
    init_session_state()
    st.write("Streamlit page structure example")
--- a/.mindmodel/manifest.yaml
+++ b/.mindmodel/manifest.yaml
@ -1,108 +0,0 @@
 # stemwijzer Mind Model - Manifest
 # Generated: 2026-04-12
 # Phase: 2 - Assembly from Phase 1 Analysis
 name: stemwijzer
 version: 2
 description: Dutch political voting compass (Stemwijzer) - Mind Model constraints
 categories:
  # Core documentation
  - path: system.md
    description: System overview and architecture summary
    group: docs
  - path: stack/stack.md
    description: Technology stack with versions and purposes
    group: stack
  - path: domain/domain-glossary.md
    description: Domain entities, terms, relationships, and CRITICAL INVARIANTS
    group: domain
  # Design patterns
  - path: patterns/patterns.yaml
    description: Code patterns (Singleton, Repository, Pipeline, etc.)
    group: patterns
  - path: patterns/streamlit.yaml
    description: Streamlit-specific patterns (session state, cache)
    group: patterns
  - path: patterns/api.yaml
    description: API client patterns with retry and pagination
    group: patterns
  - path: patterns/database.yaml
    description: DuckDB patterns and connection management
    group: patterns
  - path: patterns/python.yaml
    description: Python-specific patterns (dataclass, typing)
    group: patterns
  - path: patterns/duckdb-access.md
    description: DuckDB connection patterns and best practices
    group: patterns
  - path: patterns/embeddings-similarity.md
    description: Embeddings and similarity computation patterns
    group: patterns
  - path: patterns/error-handling.md
    description: Error handling and exception patterns
    group: patterns
  - path: patterns/module-singletons.md
    description: Module-level singleton patterns
    group: patterns
  - path: patterns/requests-http.md
    description: HTTP client patterns with retry
    group: patterns
  - path: patterns/validation.md
    description: Input validation patterns
    group: patterns
  # Coding constraints
  - path: constraints/error-handling.md
    description: Error handling patterns with safe fallbacks
    group: constraints
  - path: constraints/logging.md
    description: Logging conventions
    group: constraints
  - path: constraints/naming.yaml
    description: File, class, function naming rules
    group: constraints
  - path: constraints/imports.yaml
    description: Import organization and module structure
    group: constraints
  - path: constraints/types.yaml
    description: Type hint conventions
    group: constraints
  - path: constraints/testing.yaml
    description: Testing conventions
    group: constraints
  # Anti-patterns
  - path: anti-patterns/anti-patterns.md
    description: Known anti-patterns with evidence and fixes
    group: anti-patterns
  # Dependencies
  - path: dependencies/dependencies.md
    description: Library usage and singleton instances
    group: dependencies
  # Code examples
  - path: examples/database-example.py
    description: MotionDatabase usage examples
    group: examples
  - path: examples/api-client-example.py
    description: TweedeKamerAPI usage examples
    group: examples
  - path: examples/pipeline-example.py
    description: Pipeline orchestration examples
    group: examples
  - path: examples/streamlit-page-example.py
    description: Streamlit page patterns
    group: examples
  - path: examples/pattern-examples.md
    description: Consolidated pattern examples
    group: examples
 # Phase 1 findings summary:
 # - Tech: Python 3.13+, Streamlit, DuckDB, scipy/sklearn/umap, OpenRouter (QWEN)
 # - 10 patterns discovered: Module singletons, Repository, Service layer, Pipeline
 # - 8 anti-patterns: print() instead of logging, _DummySt global, bare except
 # - 6 code clusters: Database, Streamlit UI, API, Analysis/ML, Config, Singletons
 # - 3 groups: stdlib, 3rd party, local imports
--- a/.mindmodel/patterns/api.yaml
+++ b/.mindmodel/patterns/api.yaml
@ -1,265 +0,0 @@
 # API Client Patterns
 ## Base API Client Pattern
 Using requests.Session for connection pooling:
 ```python
 # api_client.py
 import requests
 from typing import Dict, List, Optional
 from config import config
 class TweedeKamerAPI:
    def __init__(self):
        self.odata_base_url = "https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0"
        self.session = requests.Session()
        self.session.headers.update({
            "Accept": "application/json",
            "User-Agent": "Dutch-Political-Compass-Tool/1.0",
        })
    def get_motions(
        self,
        start_date: datetime = None,
        end_date: datetime = None,
        limit: int = 500,
    ) -> List[Dict]:
        """Get motions with voting results using OData API."""
        if not start_date:
            start_date = datetime.now() - timedelta(days=730)
        try:
            voting_records, besluit_meta = self._get_voting_records(
                start_date, end_date, limit
            )
            return self._process_voting_records(voting_records, besluit_meta)
        except Exception as e:
            print(f"Error fetching motions from API: {e}")
            return []
 ```
 ## OData Pagination Pattern
 Handle server-side pagination with $skip:
 ```python
 def _get_voting_records(
    self, 
    start_date: datetime, 
    end_date: datetime = None, 
    limit: int = 50000
 ) -> tuple:
    """Fetch with automatic pagination."""
    filter_query = (
        f"GewijzigdOp ge {start_date.strftime('%Y-%m-%d')}T00:00:00Z"
        " and StemmingsSoort ne null"
        " and Verwijderd eq false"
    )
    page_size = 250  # API caps $top at 250
    base_url = f"{self.odata_base_url}/Besluit"
    base_params = {
        "$filter": filter_query,
        "$top": page_size,
        "$expand": "Stemming",
        "$orderby": "GewijzigdOp desc",
    }
    all_records = []
    skip = 0
    while len(all_records) < limit:
        params = {**base_params, "$skip": skip}
        response = self.session.get(
            base_url, 
            params=params, 
            timeout=config.API_TIMEOUT
        )
        response.raise_for_status()
        data = response.json()
        besluit_page = data.get("value", [])
        if not besluit_page:
            break
        # Process page
        for besluit in besluit_page:
            all_records.extend(self._extract_votes(besluit))
        skip += page_size
    return all_records
 ```
 ## Retry with Backoff Pattern
 For transient failures:
 ```python
 # ai_provider.py
 import time
 import random
 from requests.exceptions import ConnectionError
 def _post_with_retries(
    path: str, 
    json: dict, 
    retries: int = 3
 ) -> requests.Response:
    """POST with exponential backoff retry."""
    backoff = 0.5
    for attempt in range(1, retries + 1):
        try:
            resp = requests.post(url, json=json, headers=headers, timeout=10)
            # Handle rate limiting
            if resp.status_code == 429:
                if attempt == retries:
                    raise ProviderError("Rate limited")
                retry_after = resp.headers.get("Retry-After")
                if retry_after:
                    time.sleep(int(retry_after))
                else:
                    sleep = backoff * (2 ** (attempt - 1))
                    sleep += random.uniform(0, sleep * 0.1)
                    time.sleep(sleep)
                continue
            # Handle server errors
            if 500 <= resp.status_code < 600:
                if attempt == retries:
                    raise ProviderError(f"Server error: {resp.status_code}")
                time.sleep(backoff * (2 ** (attempt - 1)))
                continue
            return resp
        except ConnectionError as exc:
            if attempt == retries:
                raise ProviderError(f"Connection error: {exc}")
            time.sleep(backoff * (2 ** (attempt - 1)))
    raise ProviderError("Failed after retries")
 ```
 ## Batch Processing Pattern
 Process items in batches to manage API limits:
 ```python
 def get_embeddings_with_retry(
    texts: List[str],
    batch_size: int = 50,
    retries: int = 3,
 ) -> List[Optional[List[float]]]:
    """Process embeddings in batches with fallback to single items."""
    results = [None] * len(texts)
    i = 0
    while i < len(texts):
        end = min(len(texts), i + batch_size)
        chunk = texts[i:end]
        # Try batch first
        try:
            emb_chunk = get_embeddings_batch(chunk)
            for j, emb in enumerate(emb_chunk):
                results[i + j] = emb
            i = end
            continue
        except Exception:
            pass
        # Fallback: single items
        for j, text in enumerate(chunk):
            try:
                results[i + j] = get_embedding(text)
            except Exception:
                results[i + j] = None
        i = end
    return results
 ```
 ## Response Validation Pattern
 Validate API responses before processing:
 ```python
 def _process_response(self, response: requests.Response) -> Dict:
    """Validate and parse API response."""
    response.raise_for_status()
    data = response.json()
    if "value" not in data:
        raise ValueError("Unexpected response format: missing 'value' key")
    return data
 def _validate_besluit(self, besluit: Dict) -> bool:
    """Check required fields exist."""
    required = ["Id", "GewijzigdOp"]
    return all(field in besluit for field in required)
 ```
 ## Error Handling Patterns
 Always provide safe fallbacks:
 ```python
 def safe_api_call(self, endpoint: str, params: Dict = None) -> List[Dict]:
    """Call API with error handling and fallback."""
    try:
        response = self.session.get(
            endpoint, 
            params=params, 
            timeout=config.API_TIMEOUT
        )
        response.raise_for_status()
        data = response.json()
        return data.get("value", [])
    except requests.Timeout:
        _logger.warning(f"API timeout for {endpoint}")
        return []
    except requests.HTTPError as e:
        _logger.error(f"HTTP error: {e}")
        return []
    except Exception as e:
        _logger.error(f"API call failed: {e}")
        return []
 ```
 ## Session Management
 Reuse session for connection pooling:
 ```python
 class TweedeKamerAPI:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "Accept": "application/json",
            "User-Agent": "Dutch-Political-Compass-Tool/1.0",
        })
    def close(self):
        """Clean up session when done."""
        self.session.close()
    def __enter__(self):
        return self
    def __exit__(self, *args):
        self.close()
 # Usage
 with TweedeKamerAPI() as api:
    motions = api.get_motions(start_date)
 ```
--- a/.mindmodel/patterns/architecture.yaml
+++ b/.mindmodel/patterns/architecture.yaml
@ -1,230 +0,0 @@
 # Architectural Patterns
 ## Repository Pattern
 The `MotionDatabase` class acts as a repository, encapsulating all database operations behind a clean interface.
 ```python
 # database.py
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self.db_path = db_path
        self._init_database()
    def get_motion(self, motion_id: int) -> Optional[Dict]:
        """Get a single motion by ID."""
        conn = duckdb.connect(self.db_path)
        try:
            result = conn.execute(
                "SELECT * FROM motions WHERE id = ?", (motion_id,)
            ).fetchone()
            return result
        finally:
            conn.close()
    def get_filtered_motions(
        self,
        policy_area: str = "Alle",
        min_margin: float = 0.0,
        max_margin: float = 1.0,
        limit: int = 10
    ) -> List[Dict]:
        """Get filtered list of motions."""
        ...
 ```
 **Usage**: Import the singleton instance for all DB operations.
 ```python
 from database import db
 motions = db.get_filtered_motions(policy_area="Klimaat", limit=20)
 ```
 ## Facade Pattern
 Simplified interfaces over complex subsystems.
 ### MotionDatabase Facade
 ```python
 # Single entry point for all database operations
 db = MotionDatabase()  # Singleton instance
 # Operations are abstracted:
 db.create_session(total_motions)
 db.record_vote(session_id, motion_id, vote)
 db.get_party_results(session_id)
 ```
 ### API Client Facade
 ```python
 # api_client.py
 class TweedeKamerAPI:
    def __init__(self):
        self.session = requests.Session()  # Connection pooling
    def get_motions(self, start_date, end_date) -> List[Dict]:
        """Simple interface hiding OData pagination details."""
        voting_records, besluit_meta = self._get_voting_records(start_date, end_date)
        return self._process_voting_records(voting_records, besluit_meta)
 ```
 ### MotionScraper Facade
 ```python
 # scraper.py (if used)
 class MotionScraper:
    def get_motion_content(self, url: str) -> Optional[str]:
        """Extract body text from official website."""
        ...
 ```
 ## Pipeline Pattern
 Sequential phases with explicit dependencies:
 ```
 pipeline/run_pipeline.py
 ├── Phase 1: fetch_mp_metadata
 │   └── pipeline/fetch_mp_metadata.py
 ├── Phase 2: extract_mp_votes
 │   └── pipeline/extract_mp_votes.py
 ├── Phase 3: svd_pipeline
 │   └── pipeline/svd_pipeline.py
 ├── Phase 4: text_pipeline (gap-fill)
 │   └── pipeline/text_pipeline.py
 └── Phase 5: fusion (combine SVD + text)
    └── pipeline/fusion.py
 ```
 ### Phase Orchestration
 ```python
 # pipeline/run_pipeline.py
 def run(args: argparse.Namespace) -> int:
    db = MotionDatabase(args.db_path)
    # Phase 1: MP metadata
    if not args.skip_metadata:
        from pipeline.fetch_mp_metadata import fetch_mp_metadata
        fetch_mp_metadata(db_path=db.db_path)
    # Phase 2: Extract votes
    if not args.skip_extract:
        from pipeline.extract_mp_votes import extract_mp_votes
        extract_mp_votes(db_path=db.db_path)
    # Phase 3: SVD per window
    if not args.skip_svd:
        from pipeline.svd_pipeline import run_svd_pipeline
        run_svd_pipeline(db, windows, args.svd_k)
    # ... additional phases
 ```
 ## Strategy Pattern
 Interchangeable algorithms for axis computation:
 ```python
 # analysis/political_axis.py
 def compute_political_axis(
    vectors: Dict[str, np.ndarray],
    method: str = "pca"  # or "anchor"
 ) -> Tuple[np.ndarray, np.ndarray]:
    """Compute political axis using specified method.
    Methods:
    - 'pca': Use first principal component
    - 'anchor': Use predefined anchor motions
    """
    if method == "pca":
        return _compute_pca_axis(vectors)
    elif method == "anchor":
        return _compute_anchor_axis(vectors)
 ```
 ## Visitor Pattern
 External operations on data structures:
 ```python
 # analysis/trajectory.py
 def _procrustes_align_windows(
    window_vecs: Dict[str, Dict[str, np.ndarray]],
    min_overlap: int = 5,
 ) -> Dict[str, Dict[str, np.ndarray]]:
    """Align SVD vectors across windows using Procrustes rotations.
    Takes the first window as reference and aligns each subsequent window
    to it via orthogonal Procrustes on the set of common entities.
    """
 ```
 ## Builder Pattern
 Configuration via method chaining:
 ```python
 # CLI argument parsing
 parser = argparse.ArgumentParser(description="Pipeline runner")
 parser.add_argument("--db-path", default="data/motions.db")
 parser.add_argument("--start-date", default=None)
 parser.add_argument("--end-date", default=None)
 parser.add_argument("--window-size", choices=["quarterly", "annual"], default="quarterly")
 parser.add_argument("--svd-k", type=int, default=50)
 ```
 ## Decorator Pattern
 Retry logic for transient failures:
 ```python
 # pipeline/ai_provider_wrapper.py
 def get_embeddings_with_retry(
    texts: List[str],
    retries: int = 3,
    batch_size: int = 50,
 ) -> List[Optional[List[float]]]:
    """Return embeddings with automatic retry on failure."""
    for attempt in range(1, retries + 1):
        try:
            return _embedder(texts, batch_size=len(texts))
        except Exception as exc:
            if attempt == retries:
                break
            time.sleep(backoff * (2 ** (attempt - 1)))
    return [None] * len(texts)  # Safe fallback
 ```
 ## Data Patterns
 ### Batch Processing
 Process items in chunks to manage memory and API limits:
 ```python
 for i in range(0, len(items), batch_size):
    chunk = items[i:i + batch_size]
    process_batch(chunk)
 ```
 ### Caching
 Pre-compute and store expensive results:
 ```python
 # SimilarityCache table stores computed similarities
 db.get_similarity(motion_a, motion_b)
 ```
 ### Lazy Loading
 Load data only when needed:
 ```python
 class MotionDatabase:
    @property
    def _connection(self):
        if self._conn is None:
            self._conn = duckdb.connect(self.db_path)
        return self._conn
 ```
 ### Vectorization
 Use numpy for batch operations:
 ```python
 vectors = np.array([v for v in entity_vectors.values()])
 normalized = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)
 ```
--- a/.mindmodel/patterns/database.yaml
+++ b/.mindmodel/patterns/database.yaml
@ -1,239 +0,0 @@
 # DuckDB Database Patterns
 ## Connection Management
 ### Pattern 1: Short-lived per Method (Most Common)
 Always create a new connection, use try/finally for cleanup:
 ```python
 # database.py
 class MotionDatabase:
    def get_motion(self, motion_id: int) -> Optional[Dict]:
        conn = duckdb.connect(self.db_path)
        try:
            result = conn.execute(
                "SELECT * FROM motions WHERE id = ?", 
                (motion_id,)
            ).fetchone()
            conn.close()
            return result
        except Exception:
            conn.close()
            return None
    def get_filtered_motions(
        self, 
        policy_area: str = "Alle",
        min_margin: float = 0.0,
        max_margin: float = 1.0,
        limit: int = 10
    ) -> List[Dict]:
        conn = duckdb.connect(self.db_path)
        try:
            query = """
                SELECT * FROM motions 
                WHERE (? = 'Alle' OR policy_area = ?)
                AND winning_margin BETWEEN ? AND ?
                ORDER BY RANDOM()
                LIMIT ?
            """
            rows = conn.execute(query, (policy_area, policy_area, min_margin, max_margin, limit)).fetchall()
            conn.close()
            return rows
        except Exception:
            conn.close()
            return []
 ```
 ### Pattern 2: With Statement (Cleaner)
 ```python
 def execute_query(self, query: str, params: tuple = ()):
    with duckdb.connect(self.db_path) as conn:
        return conn.execute(query, params).fetchall()
 ```
 ### Pattern 3: Lazy Connection Caching
 For frequently accessed connections:
 ```python
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self.db_path = db_path
        self._conn = None
    @property
    def connection(self):
        if self._conn is None:
            self._conn = duckdb.connect(self.db_path)
        return self._conn
    def close(self):
        if self._conn:
            self._conn.close()
            self._conn = None
 ```
 ## Table Initialization
 Create tables with proper constraints and sequences:
 ```python
 def _init_database(self):
    conn = duckdb.connect(self.db_path)
    # Create sequence for auto-incrementing IDs
    try:
        conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
    except:
        pass
    # Create tables
    conn.execute("""
        CREATE TABLE IF NOT EXISTS motions (
            id INTEGER DEFAULT nextval('motions_id_seq'),
            title TEXT NOT NULL,
            description TEXT,
            date DATE,
            policy_area TEXT,
            voting_results JSON,
            winning_margin FLOAT,
            controversy_score FLOAT,
            layman_explanation TEXT,
            externe_identifier TEXT,
            body_text TEXT,
            url TEXT UNIQUE,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            PRIMARY KEY (id)
        )
    """)
    # Add columns to existing tables safely
    try:
        conn.execute("ALTER TABLE motions ADD COLUMN IF NOT EXISTS body_text TEXT")
    except Exception:
        pass  # Column may already exist
    conn.close()
 ```
 ## JSON Column Handling
 Store and retrieve JSON data:
 ```python
 # Insert JSON
 def store_motion(self, motion: Dict):
    conn = duckdb.connect(self.db_path)
    try:
        conn.execute(
            "INSERT INTO motions (title, voting_results) VALUES (?, ?)",
            (motion["title"], json.dumps(motion["voting_results"]))
        )
        conn.close()
    except Exception:
        conn.close()
 # Query JSON
 def get_motions_with_votes(self, party: str) -> List[Dict]:
    conn = duckdb.connect(self.db_path)
    try:
        rows = conn.execute("""
            SELECT title, voting_results 
            FROM motions 
            WHERE JSON_EXTRACT(voting_results, '$.party') = ?
        """, (party,)).fetchall()
        conn.close()
        return rows
    except Exception:
        conn.close()
        return []
 ```
 ## Query Patterns
 ### Parameterized Queries (Always!)
 ```python
 # SAFE - uses parameterized query
 conn.execute("SELECT * FROM motions WHERE id = ?", (motion_id,))
 # AVOID - SQL injection risk
 # conn.execute(f"SELECT * FROM motions WHERE id = {motion_id}")  # BAD!
 ```
 ### Batch Inserts
 ```python
 def bulk_insert_motions(self, motions: List[Dict]):
    conn = duckdb.connect(self.db_path)
    try:
        for motion in motions:
            conn.execute(
                """INSERT OR IGNORE INTO motions 
                   (title, date, policy_area) VALUES (?, ?, ?)""",
                (motion["title"], motion["date"], motion["policy_area"])
            )
        conn.close()
    except Exception:
        conn.close()
 ```
 ### Aggregation Queries
 ```python
 def get_party_vote_stats(self, party: str) -> Dict:
    conn = duckdb.connect(self.db_path)
    try:
        result = conn.execute("""
            SELECT 
                COUNT(*) as total_votes,
                SUM(CASE WHEN vote = 'Voor' THEN 1 ELSE 0 END) as voor,
                SUM(CASE WHEN vote = 'Tegen' THEN 1 ELSE 0 END) as tegen
            FROM mp_votes
            WHERE party = ?
        """, (party,)).fetchone()
        conn.close()
        return {"total": result[0], "voor": result[1], "tegen": result[2]}
    except Exception:
        conn.close()
        return {"total": 0, "voor": 0, "tegen": 0}
 ```
 ## Error Handling
 Always close connections in finally block or with context manager:
 ```python
 def safe_query(self, query: str, params: tuple = ()):
    conn = None
    try:
        conn = duckdb.connect(self.db_path)
        result = conn.execute(query, params).fetchall()
        return result
    except Exception as e:
        _logger.error(f"Query failed: {e}")
        return []
    finally:
        if conn:
            conn.close()
 ```
 ## Testing with Mock
 For unit tests without DuckDB:
 ```python
 # In MotionDatabase.__init__
 def __init__(self, db_path: str = config.DATABASE_PATH):
    self.db_path = db_path
    self._file_mode = duckdb is None
    if duckdb is None:
        # Create JSON fallback files
        for p in (f"{db_path}.embeddings.json", f"{db_path}.similarity_cache.json"):
            if not os.path.exists(p):
                with open(p, "w") as fh:
                    fh.write("[]")
    else:
        self._init_database()
 ```
--- a/.mindmodel/patterns/duckdb-access.md
+++ b/.mindmodel/patterns/duckdb-access.md
@ -1,79 +0,0 @@
 ---
 title: DuckDB Access Pattern
 category: patterns
 ---
 # DuckDB Access Pattern
 ## Rules
 - Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
 - Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
 - If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
 - Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
 ## Examples
 ### database.py - Explicit connect/close for schema init
 ```python
 conn = duckdb.connect(self.db_path)
 ...
 conn.execute("""
    CREATE TABLE IF NOT EXISTS fused_embeddings (
        id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
        motion_id INTEGER NOT NULL,
        window_id TEXT NOT NULL,
        vector JSON NOT NULL,
        svd_dims INTEGER NOT NULL,
        text_dims INTEGER NOT NULL,
        created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (id)
    )
 """)
 conn.close()
 ```
 ### pipeline/svd_pipeline.py - Read-only connection
 ```python
 conn = duckdb.connect(db_path, read_only=True)
 try:
    rows = conn.execute(
        "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
        (start_date, end_date),
    ).fetchall()
 finally:
    conn.close()
 ```
 ### similarity/compute.py - Preferred 'with' context
 ```python
 try:
    import duckdb
 except Exception:
    logger.exception("duckdb import failed; cannot load vectors")
    return 0
 with duckdb.connect(db.db_path) as conn:
    rows = conn.execute(query, params).fetchall()
 ```
 ## Anti-Patterns
 ### Bad: Connection without closure
 ```python
 # BAD: connection may leak if exception occurs before explicit close
 conn = duckdb.connect(db_path)
 rows = conn.execute("SELECT ...").fetchall()
 # missing finally/close
 ```
 **Remediation**: Use "with" context or ensure conn.close() in finally block.
 ### Bad: Parallel write connections
 **Problem**: Opening write connections from many parallel workers without coordination.
 **Remediation**: Open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.
--- a/.mindmodel/patterns/embeddings-similarity.md
+++ b/.mindmodel/patterns/embeddings-similarity.md
@ -1,74 +0,0 @@
 ---
 title: Embeddings Similarity Pipeline
 category: patterns
 ---
 # Embeddings Similarity Pipeline
 ## Rules
 - Keep embedding calls batched where possible; fallback to per-item attempts on persistent batch failure.
 - Store raw embeddings, SVD vectors, and fused_embeddings separately; fused_embeddings are typically concatenation [svd + text].
 - Compute similarity as normalized cosine on padded vectors; record top-k neighbors in similarity_cache.
 - Use read_only DuckDB connections in compute workers to allow parallel runs.
 ## Examples
 ### pipeline/ai_provider_wrapper.py - Batched embed + fallback
 ```python
 for start in range(0, len(texts), batch_size):
    chunk = texts[start : start + batch_size]
    resp = _post_with_retries("/embeddings", json={"model": model, "input": chunk})
 ...
 for j in range(i, end):
    t = texts[j]
    single, single_exc = _attempt_batch([t], j)
    if single:
        results[j] = single[0]
 ```
 ### pipeline/fusion.py - Concatenation and storage
 ```python
 try:
    svd_vec = json.loads(svd_json)
 except Exception:
    _logger.exception("Invalid SVD vector JSON for entity %s", entity_id)
    skipped_missing_svd += 1
    continue
 ...
 fused = list(svd_vec) + list(text_vec)
 res = db.store_fused_embedding(
    int(entity_id),
    window_id,
    fused,
    svd_dims=len(svd_vec),
    text_dims=len(text_vec),
 )
 ```
 ### similarity/compute.py - Normalized cosine similarity
 ```python
 # Normalize rows
 norms = np.linalg.norm(matrix, axis=1, keepdims=True)
 norms[norms == 0] = 1.0
 normalized = matrix / norms
 sim = normalized @ normalized.T
 ...
 # pick top-k neighbors and write to similarity_cache
 ```
 ## Anti-Patterns
 ### Bad: Assuming consistent vector length
 **Problem**: Assuming consistent vector length without checks leads to shape errors.
 **Remediation**: Detect inconsistent lengths, pad with zeros, and log a warning (as seen in compute.py).
 ### Bad: Inline heavy computation in UI
 **Problem**: Recomputing heavy pipelines inline in UI requests.
 **Remediation**: Schedule heavy work in scripts/subprocesses and read precomputed results in UI.
--- a/.mindmodel/patterns/error-handling.md
+++ b/.mindmodel/patterns/error-handling.md
@ -1,63 +0,0 @@
 ---
 title: Error Handling Pattern
 category: patterns
 ---
 # Error Handling Pattern
 ## Rules
 - Use explicit exceptions for domain/error classification (e.g., ProviderError, ValueError).
 - Prefer logging.exception when catching an exception where stack trace is useful.
 - Avoid broad except: clauses that swallow exceptions; if broad except is used for "best-effort" fallback, log at warning and include original exception context.
 - For public library-like functions, prefer raising typed exceptions instead of returning magic values ([], False) — only return safe defaults where documented.
 ## Examples
 ### ai_provider.py - Network error to ProviderError
 ```python
 except requests.ConnectionError as exc:
    if attempt == retries:
        raise ProviderError(
            f"Connection error when calling provider: {exc}"
        ) from exc
    ...
 ```
 ### pipeline/ai_provider_wrapper.py - Best-effort with logging
 ```python
 except Exception:
    _logger.exception("Failed to append audit event for embedding failure")
 results[j] = None
 ```
 ### similarity/compute.py - Defensive import handling
 ```python
 try:
    import duckdb
 except Exception:
    logger.exception("duckdb import failed; cannot load vectors")
    return 0
 ```
 ## Anti-Patterns
 ### Bad: Silent exception swallowing
 ```python
 try:
    do_work()
 except Exception:
    return []
 # BAD: hides the root cause and returns an ambiguous default
 ```
 **Remediation**: Narrow exception types or at minimum log.exception() and re-raise or convert to a domain error if truly handled.
 ### Bad: Mixing print() and logging
 **Problem**: Mixing print() and logging for errors.
 **Remediation**: Replace print() calls with logger.* calls; use structured logging configuration.
--- a/.mindmodel/patterns/module-singletons.md
+++ b/.mindmodel/patterns/module-singletons.md
@ -1,41 +0,0 @@
 ---
 title: Module Singletons Pattern
 category: patterns
 ---
 # Module Singletons Pattern
 ## Rules
 - Module-level singletons (e.g., db = MotionDatabase()) are acceptable but should be created carefully:
  - Avoid expensive initialization at import time.
  - Provide a way to construct with a test DB path or to reinitialize in tests.
 - If a singleton holds resources (DB connections, sessions), ensure safe shutdown on program exit.
 ## Examples
 ### database.py - Safe class initialization
 ```python
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self.db_path = db_path
        # If duckdb is not available, operate in lightweight file-backed mode
        self._file_mode = duckdb is None
        self._init_database()
 ```
 ### similarity/lookup.py - Local instances
 ```python
 db = MotionDatabase(db_path=db_path) if db_path else MotionDatabase()
 if hasattr(db, "get_cached_similarities"):
    rows = db.get_cached_similarities(...)
 ```
 ## Anti-Patterns
 ### Bad: Heavy initialization at import time
 **Problem**: Creating connections and performing heavy schema migrations during import.
 **Remediation**: Move heavy init to an explicit initialize() method and keep import fast.
--- a/.mindmodel/patterns/patterns.yaml
+++ b/.mindmodel/patterns/patterns.yaml
@ -1,228 +0,0 @@
 # Code Patterns
 ## 1. Page Wrapper Pattern
 Thin Streamlit page files delegate to core modules. Pages contain only route logic, not business logic.
 **Example** (pages/1_🗳️_Stemwijzer.py):
 ```python
 import streamlit as st
 from quiz_module import render_quiz_page
 st.set_page_config(...)
 render_quiz_page()
 ```
 **Example** (pages/2_🔍_Explorer.py):
 ```python
 import streamlit as st
 from explorer import render_explorer
 st.set_page_config(...)
 render_explorer()
 ```
 **Rule**: Pages should have <20 lines of logic. All complexity lives in modules.
 ---
 ## 2. Pipeline Pattern
 Data flows: fetch → transform → store
 **Location**: `pipeline/` directory
 **Pattern**:
 ```python
 def run_pipeline():
    raw_data = fetch_from_source()
    transformed = transform(raw_data)
    store(transformed)
 def fetch_from_source():
    # API call or DB query
    ...
 def transform(raw):
    # Clean, normalize, compute derived fields
    ...
 ```
 **Usage**: SVD computation pipeline, data ingestion, motion processing
 ---
 ## 3. API Client Pattern
 HTTP client with retry/backoff for external data sources.
 **Pattern**:
 ```python
 import time
 import requests
 def fetch_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(url)
            response.raise_for_status()
            return response.json()
        except requests.RequestException:
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # exponential backoff
            else:
                raise
 ```
 ---
 ## 4. Pure Helper Functions
 Functions in `explorer_helpers.py` have no side effects, no IO.
 **Pattern**:
 ```python
 def compute_party_coords(svd_df, party_map, window):
    """Pure function: same inputs → same outputs, no side effects."""
    # Filter, compute, return
    return result_df
 def build_scatter_trace(df, color_col, marker_size=8):
    """Pure: returns Plotly trace dict, no rendering."""
    trace = go.Scatter(x=df.x, y=df.y, mode='markers', ...)
    return trace
 ```
 **Rule**: No `import streamlit` in helper modules. No file I/O. No global state.
 ---
 ## 5. Dummy Fallbacks for Optional Dependencies
 Gracefully degrade when optional packages are unavailable.
 **Pattern**:
 ```python
 try:
    import umap
    HAS_UMAP = True
 except ImportError:
    HAS_UMAP = False
    # or provide dummy stub
 def project_to_2d(vectors):
    if HAS_UMAP:
        return umap.UMAP().fit_transform(vectors)
    else:
        return vectors[:, :2]  # fallback: just take first 2 dims
 ```
 **Used for**: UMAP, Plotly (with fallback to altair or text-only)
 ---
 ## 6. Cached Data Loaders
 Expensive DB queries wrapped with `@st.cache_data`.
 **Pattern**:
 ```python
@st.cache_data
 def load_svd_vectors(window: str) -> pd.DataFrame:
    return db.query("SELECT * FROM svd_vectors WHERE window = ?", window)
@st.cache_data
 def load_party_centroids(window: str) -> pd.DataFrame:
    return db.query("SELECT * FROM party_centroids WHERE window = ?", window)
 # Clear cache when data updates
@st.cache_data
 def load_motions(category: str | None = None) -> pd.DataFrame:
    ...
 ```
 **Rule**: Use `ttl=3600` for large datasets. Use `show_spinner=False` where appropriate.
 ---
 ## 7. Plotly Dual-Layer Charts
 Charts built with two traces: scatter points + text annotations.
 **Pattern**:
 ```python
 def build_dual_layer_chart(df, x_col, y_col, label_col):
    # Layer 1: markers
    scatter = go.Scatter(
        x=df[x_col], y=df[y_col],
        mode='markers',
        marker=dict(size=10, color=df['color']),
        name='Parties'
    )
    # Layer 2: labels (smaller, non-hoverable)
    labels = go.Scatter(
        x=df[x_col], y=df[y_col],
        mode='text',
        text=df[label_col],
        textposition='top center',
        showlegend=False
    )
    return [scatter, labels]
 ```
 **Used in**: Explorer tab charts, party position plots
 ---
 ## 8. Singleton Module Instances
 One shared instance per module, created at import time.
 **Pattern**:
 ```python
 # database.py
 class MotionDatabase:
    def __init__(self, db_path=None):
        self.conn = ibis.duckdb.connect(db_path)
        self._load_schema()
 _db = None
 def get_db():
    global _db
    if _db is None:
        _db = MotionDatabase()
    return _db
 # At module bottom:
 db = MotionDatabase()  # singleton instance
 ```
 **Also used in**: `config.py` exports `config` and `PARTY_COLOURS`
 ---
 ## 9. Dataclass Config Pattern
 Configuration centralized in a `@dataclass`.
 **Pattern**:
 ```python
 from dataclasses import dataclass, field
@dataclass
 class Config:
    db_path: str = "data/stemwijzer.duckdb"
    default_window: str = "2023"
    cache_ttl: int = 3600
    party_colours: dict = field(default_factory=lambda: PARTY_COLOURS)
    def __post_init__(self):
        if not Path(self.db_path).exists():
            raise FileNotFoundError(f"Database not found: {self.db_path}")
 ```
 ---
 ## 10. Graceful Degradation with try/except
 Core pattern throughout: attempt operation, fall back gracefully.
 **Pattern**:
 ```python
 def get_political_position(mp_name, window):
    try:
        vectors = load_svd_vectors(window)
        return vectors[vectors['mp_name'] == mp_name]['vector_2d'].iloc[0]
    except (KeyError, IndexError):
        return [0.0, 0.0]  # neutral fallback
 ```
--- a/.mindmodel/patterns/python.yaml
+++ b/.mindmodel/patterns/python.yaml
@ -1,196 +0,0 @@
 # Python-Specific Patterns
 ## Singleton Pattern
 Use module-level instances for shared resources:
 ```python
 # database.py
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self.db_path = db_path
        self._init_database()
    def _init_database(self):
        # Initialize tables on first instantiation
        ...
 # Bottom of file - the singleton
 db = MotionDatabase()
 ```
 **Usage across the codebase:**
 ```python
 # In other modules
 from database import db
 def some_function():
    motions = db.get_filtered_motions(limit=10)
    return motions
 ```
 Similarly for other singletons:
 ```python
 # summarizer.py
 class MotionSummarizer:
    def __init__(self):
        pass  # Stateless
    def generate_layman_explanation(self, title: str, body: str) -> str:
        ...
 summarizer = MotionSummarizer()
 ```
 ## Dataclass Config Pattern
 Use dataclass for configuration with environment variable support:
 ```python
 # config.py
 from dataclasses import dataclass
 from typing import List
 import os
@dataclass
 class Config:
    # Database settings
    DATABASE_PATH = "data/motions.db"
    # API settings
    TWEEDE_KAMER_ODATA_API = "https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0"
    API_TIMEOUT = 30
    API_BATCH_SIZE = 250
    # AI settings
    OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
    OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
    QWEN_MODEL = "qwen/qwen-2.5-72b-instruct"
    # App settings
    DEFAULT_MOTION_COUNT = 10
    SESSION_TIMEOUT_DAYS = 30
    # Policy areas
    POLICY_AREAS: List[str] = None
    def __post_init__(self):
        self.POLICY_AREAS = [
            "Alle", "Economie", "Klimaat", "Immigratie", 
            "Zorg", "Onderwijs", "Defensie", "Sociale Zaken", "Algemeen"
        ]
 config = Config()
 ```
 **Usage:**
 ```python
 from config import config
 # Access as attributes
 timeout = config.API_TIMEOUT
 areas = config.POLICY_AREAS
 ```
 ## DuckDB Connection Pattern
 Short-lived connections with explicit cleanup:
 ```python
 class MotionDatabase:
    def get_motion(self, motion_id: int) -> Optional[Dict]:
        conn = duckdb.connect(self.db_path)
        try:
            result = conn.execute(
                "SELECT * FROM motions WHERE id = ?", 
                (motion_id,)
            ).fetchone()
            return result
        finally:
            conn.close()
    def get_filtered_motions(self, **kwargs) -> List[Dict]:
        conn = duckdb.connect(self.db_path)
        try:
            rows = conn.execute(query, params).fetchall()
            return rows
        except Exception:
            return []  # Safe fallback
        finally:
            conn.close()
 ```
 **Context manager alternative (preferred when applicable):**
 ```python
 def some_operation(self):
    with duckdb.connect(self.db_path) as conn:
        result = conn.execute("SELECT ...").fetchall()
    return result
 ```
 ## Try/Except with Fallback Pattern
 Always provide safe fallbacks:
 ```python
 def get_motion_or_default(self, motion_id: int) -> Dict:
    try:
        conn = duckdb.connect(self.db_path)
        result = conn.execute("SELECT * FROM motions WHERE id = ?", (motion_id,)).fetchone()
        conn.close()
        return result if result else {}
    except Exception:
        return {}
 ```
 ## Optional Import Pattern
 Handle optional dependencies gracefully:
 ```python
 try:
    import duckdb
 except Exception:  # pragma: no cover
    duckdb = None
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self._file_mode = duckdb is None
        ...
 ```
 ## Property Pattern
 Lazy initialization of expensive resources:
 ```python
 class MotionDatabase:
    def __init__(self, db_path: str = config.DATABASE_PATH):
        self.db_path = db_path
        self._session_cache = None
    @property
    def session(self):
        """Lazy-load expensive resources."""
        if self._session_cache is None:
            self._session_cache = self._create_session()
        return self._session_cache
 ```
 ## Type Annotation Patterns
 ```python
 from typing import Dict, List, Optional, Tuple, Any
 # Optional with None default
 def get_motion(self, motion_id: Optional[int] = None) -> Optional[Dict]:
    ...
 # Multiple return types
 def parse_vote(self, vote_str: str) -> Tuple[bool, str]:
    """Returns (success, error_message)"""
    ...
 # Generic types
 def get_batch(self, ids: List[int]) -> Dict[str, Any]:
    ...
 ```
--- a/.mindmodel/patterns/requests-http.md
+++ b/.mindmodel/patterns/requests-http.md
@ -1,77 +0,0 @@
 ---
 title: Requests HTTP Pattern
 category: patterns
 ---
 # Requests HTTP Pattern
 ## Rules
 - Reuse requests.Session when making multiple calls to the same host to benefit from connection pooling.
 - Wrap outbound HTTP calls with retry/backoff logic and respect Retry-After on 429.
 - Treat 5xx as transient and retry; surface 4xx as configuration/client errors (do not retry unless 429).
 - Raise or wrap non-OK responses into domain ProviderError to make behavior consistent across the codebase.
 ## Examples
 ### ai_provider.py - 429 handling with Retry-After
 ```python
 resp = requests.post(url, json=json, headers=headers, timeout=10)
 ...
 if getattr(resp, "status_code", 0) == 429:
    if attempt == retries:
        raise ProviderError(f"Provider returned HTTP {resp.status_code}")
    retry_after = None
    raw = resp.headers.get("Retry-After") if getattr(resp, "headers", None) else None
    if raw:
        try:
            retry_after = int(raw)
        except Exception:
            ...
    if retry_after is not None:
        time.sleep(retry_after)
        continue
 ```
 ### api_client.py - Session + raise_for_status
 ```python
 response = self.session.get(
    base_url, params=params, timeout=config.API_TIMEOUT
 )
 response.raise_for_status()
 data = response.json()
 ```
 ### pipeline/ai_provider_wrapper.py - Retry/backoff wrapper
 ```python
 def _attempt_batch(chunk_texts, start_index):
    backoff = 0.5
    for attempt in range(1, retries + 1):
        try:
            emb_chunk = _embedder(
                chunk_texts, model=model, batch_size=len(chunk_texts)
            )
            return emb_chunk, None
        except Exception as exc:
            if attempt == retries:
                break
            sleep = backoff * (2 ** (attempt - 1))
            time.sleep(sleep)
            continue
 ```
 ## Anti-Patterns
 ### Bad: Silent exception swallowing
 **Problem**: Blindly catching all requests exceptions and returning empty response.
 **Remediation**: Map network exceptions to retryable vs terminal (ProviderError) and log details.
 ### Bad: Using print() for errors
 **Problem**: Using print() for network errors instead of structured logging.
 **Remediation**: Use `_logger.exception()` instead (see api_client.py needs fixing).
--- a/.mindmodel/patterns/streamlit.yaml
+++ b/.mindmodel/patterns/streamlit.yaml
@ -1,225 +0,0 @@
 # Streamlit Patterns
 ## Session State Initialization
 Always initialize session state at the start of the main function:
 ```python
 # app.py
 import streamlit as st
 def main():
    # Initialize all session state variables
    if "session_id" not in st.session_state:
        st.session_state.session_id = None
    if "current_motion_index" not in st.session_state:
        st.session_state.current_motion_index = 0
    if "motions" not in st.session_state:
        st.session_state.motions = []
    if "show_results" not in st.session_state:
        st.session_state.show_results = False
    # Rest of app...
 ```
 ## Page Configuration
 Set page config at the top of each page file:
 ```python
 # pages/1_Stemwijzer.py
 import streamlit as st
 st.set_page_config(
    page_title="Stemwijzer",
    page_icon="🗳️",
    layout="centered",
 )
 from explorer import build_mp_quiz_tab
 build_mp_quiz_tab("data/motions.db")
 ```
 ## Thin Page Wrapper Pattern
 Pages delegate to shared functions in main modules:
 ```python
 # pages/2_Explorer.py
 import streamlit as st
 st.set_page_config(
    page_title="Explorer",
    page_icon="🔭",
    layout="wide",
 )
 from explorer import build_explorer_tab
 build_explorer_tab()
 ```
 ```python
 # explorer.py
 def build_explorer_tab():
    st.header("🔭 Politiek Explorer")
    tab1, tab2, tab3 = st.tabs([
        "Compass", 
        "Trajectories", 
        "Zoeken"
    ])
    with tab1:
        render_compass()
    with tab2:
        render_trajectories()
    with tab3:
        render_search()
 ```
 ## Sidebar Pattern
 Use sidebar for configuration and navigation:
 ```python
 # app.py
 def main():
    with st.sidebar:
        st.header("Instellingen")
        motion_count = st.slider(
            "Aantal moties",
            min_value=5,
            max_value=25,
            value=10,
        )
        policy_area = st.selectbox("Beleidsgebied", config.POLICY_AREAS)
        if st.button("Start Nieuwe Sessie"):
            start_new_session(motion_count, policy_area)
 ```
 ## Callback Pattern for State Updates
 Use callbacks to handle user interactions:
 ```python
 def on_motion_vote(motion_id: int, vote: str):
    """Callback when user votes on a motion."""
    st.session_state.user_votes[motion_id] = vote
    # Move to next motion
    if st.session_state.current_motion_index < len(st.session_state.motions) - 1:
        st.session_state.current_motion_index += 1
    else:
        st.session_state.show_results = True
    st.rerun()
 # In UI
 col1, col2, col3 = st.columns(3)
 with col1:
    st.button("👍 Voor", on_click=on_motion_vote, args=(motion_id, "Voor"))
 with col2:
    st.button("👎 Tegen", on_click=on_motion_vote, args=(motion_id, "Tegen"))
 with col3:
    st.button("❓ Onthouden", on_click=on_motion_vote, args=(motion_id, "Onthouden"))
 ```
 ## Container Pattern for Dynamic Content
 Use containers for dynamic rendering:
 ```python
 def show_motion_interface():
    if not st.session_state.motions:
        st.warning("Geen moties geladen")
        return
    current_idx = st.session_state.current_motion_index
    motion = st.session_state.motions[current_idx]
    with st.container():
        st.subheader(f"Motie {current_idx + 1} van {len(st.session_state.motions)}")
        st.markdown(f"**{motion['title']}**")
        st.caption(f"📅 {motion['date']} | 🏷️ {motion['policy_area']}")
        if motion.get("layman_explanation"):
            st.info(motion["layman_explanation"])
        # Voting buttons...
 ```
 ## Expander Pattern for Details
 Use expanders for collapsible content:
 ```python
 with st.expander("Meer details"):
    st.markdown(f"**Beschrijving:** {motion.get('description', 'N/A')}")
    if motion.get("voting_results"):
        results = json.loads(motion["voting_results"])
        st.json(results)
 ```
 ## Form Pattern for Batch Updates
 Use forms for multiple related inputs:
 ```python
 with st.form("session_settings"):
    st.subheader("Sessie Instellingen")
    col1, col2 = st.columns(2)
    with col1:
        count = st.number_input("Aantal moties", min_value=5, max_value=25)
    with col2:
        area = st.selectbox("Beleidsgebied", config.POLICY_AREAS)
    submitted = st.form_submit_button("Start Sessie")
    if submitted:
        start_session(count, area)
 ```
 ## Caching Pattern
 Cache expensive computations:
 ```python
@st.cache_data(ttl=3600)  # Cache for 1 hour
 def load_party_positions(window_id: str) -> Dict:
    """Load party positions from database."""
    return db.get_party_positions(window_id)
@st.cache_resource
 def init_database():
    """Initialize database connection."""
    return MotionDatabase(config.DATABASE_PATH)
 ```
 ## Home Page Pattern
 Landing page with navigation:
 ```python
 # Home.py
 import streamlit as st
 st.set_page_config(
    page_title="Motief: de stematlas",
    page_icon="🗺️",
    layout="centered",
 )
 def main():
    st.title("🗺️ Motief: de stematlas")
    st.markdown("**Motief** brengt de Nederlandse Tweede Kamer in kaart...")
    col1, col2 = st.columns(2)
    with col1:
        st.page_link("pages/1_Stemwijzer.py", label="Open Stemwijzer", icon="🗳️")
    with col2:
        st.page_link("pages/2_Explorer.py", label="Open Explorer", icon="🔭")
 ```
--- a/.mindmodel/patterns/validation.md
+++ b/.mindmodel/patterns/validation.md
@ -1,37 +0,0 @@
 ---
 title: Validation Pattern
 category: patterns
 ---
 # Validation Pattern
 ## Rules
 - Validate inputs early and raise ValueError or domain-specific exceptions (ProviderError) for invalid contract inputs.
 - Tests should assert that invalid inputs raise the expected exceptions.
 - Use explicit checks for types and shapes on public APIs (e.g., ensure text is str before embedding).
 ## Examples
 ### ai_provider.py - Type validation
 ```python
 if not isinstance(text, str):
    raise ProviderError("text must be a string")
 ```
 ### pipeline/ai_provider_wrapper.py - Defensive empty handling
 ```python
 if not texts:
    return []
 if motion_ids is None:
    motion_ids = [None for _ in texts]
 ```
 ## Anti-Patterns
 ### Bad: Invalid values into computation
 **Problem**: Allowing invalid values to propagate into heavy computation (e.g., non-string into embedding pipeline).
 **Remediation**: Fail fast with a typed exception and add unit tests to cover validations.
--- a/.mindmodel/stack/stack.md
+++ b/.mindmodel/stack/stack.md
@ -1,67 +0,0 @@
 ---
 title: Tech Stack
 category: stack
 ---
 # Tech Stack
 ## Runtime & Language
 - **Python >=3.13**
 ## Web Framework
 - **Streamlit** - Multi-page app with Home, Stemwijzer, Explorer pages
 ## Data Layer
 - **DuckDB** - Embedded OLAP database
  - Tables: motions, mp_votes, svd_vectors, fused_embeddings, embeddings, user_sessions, party_results, mp_metadata
 - **ibis** - ORM (referenced but DuckDB-native implementation used)
 ## AI / LLM
 - **OpenRouter** - API abstraction for AI providers
 - **QWEN** - Primary model
  - Embeddings: `qwen/qwen3-embedding-4b`
  - Chat: `qwen/qwen-2.5-72b-instruct`
 - **requests** - HTTP client (not raw openai)
 ## ML / Analytics
 - **scikit-learn** - KMeans clustering, cosine_similarity, StandardScaler
 - **scipy** - SVD (scipy.linalg.svd), spatial.procrustes
 - **umap-learn** - Dimensionality reduction (optional, graceful fallback to SVD)
 - **numpy** - Numerical computing
 ## Visualization
 - **Plotly** - Interactive charts (go.Figure, _DummyTrace fallback)
 - **matplotlib** - Static plotting (optional)
 ## HTTP & Parsing
 - **requests** - Session pooling, retry with backoff
 - **beautifulsoup4** - HTML parsing
 - **lxml** - XML/HTML processing
 ## Key Source Files
 | File | Purpose |
 |------|---------|
 | `database.py` | MotionDatabase singleton, DuckDB connection, 9-table schema |
 | `explorer.py` | Explorer page with 4 tabs (Motion, MP, Party, Evolution) |
 | `explorer_helpers.py` | Pure helper functions, Plotly chart builders |
 | `analysis/` | SVD pipeline, UMAP projection, clustering |
 | `pipeline/` | Data fetch, transform, store pipeline |
 | `pages/1_Stemwijzer.py` | Quiz page |
 | `pages/2_Explorer.py` | Explorer page |
 | `config.py` | Dataclass Config pattern |
 | `ai_provider.py` | OpenRouter API wrapper with retry |
 | `api_client.py` | TweedeKamer OData API client |
 ## Singleton Instances
 | Module | Instance | Type |
 |--------|----------|------|
 | `database.py` | `db` | `MotionDatabase` |
 | `config.py` | `config` | `Config` (dataclass) |
 | `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
 ## Environment
 - Python >=3.13
 - Environment variables via `.env` (DB path, API keys)
 - No `.env` values in constraint files (security)
--- a/.mindmodel/system.md
+++ b/.mindmodel/system.md
@ -1,88 +0,0 @@
 # System Overview
 ## Project: Stemwijzer (Dutch Political Voting Compass)
 **Purpose**: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views.
 ## Architecture Summary
 ### Data Flow
 ```
 TweedeKamer OData API
        ↓
  API Client (api_client.py)
        ↓
  DuckDB Database (database.py)
        ↓
  Pipeline Processing (pipeline/)
        ├── fetch_mp_metadata     # MP party + tenure
        ├── extract_mp_votes     # voting_results → mp_votes
        ├── svd_pipeline          # SVD on vote matrix + Procrustes
        ├── text_pipeline         # AI embeddings via OpenRouter
        └── fusion                # Combine SVD + text vectors
        ↓
  Streamlit Web App (Home.py, pages/)
        ├── Home.py               # Landing page
        ├── 1_Stemwijzer.py       # Voting quiz
        └── 2_Explorer.py        # Political compass explorer
 ```
 ### Key Components
 | Component | Purpose | File(s) |
 |-----------|---------|---------|
 | **Database** | Motion storage, MP votes, embeddings | `database.py` |
 | **API Client** | TweedeKamer OData API integration | `api_client.py` |
 | **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` |
 | **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` |
 | **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` |
 | **Explorer Helpers** | Pure functions, chart builders | `explorer_helpers.py` |
 | **Web App** | Streamlit UI | `Home.py`, `pages/*.py` |
 ### Tech Stack
 - **Language**: Python 3.13+
 - **Web Framework**: Streamlit (multi-page app)
 - **Database**: DuckDB with ibis ORM (DuckDB-native implementation)
 - **ML/Analytics**: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional)
 - **AI/LLM**: OpenRouter-compatible API (QWEN embeddings + chat)
 - **Visualization**: Plotly (interactive charts), matplotlib (optional)
 - **HTTP**: requests with Session pooling and retry
 - **Parsing**: beautifulsoup4, lxml
 ### Key Patterns
 1. **Module-Level Singletons**: `db = MotionDatabase()`, `config = Config()`
 2. **Repository Pattern**: MotionDatabase class with method-per-query
 3. **Service Layer**: TweedeKamerAPI, ai_provider with retry/backoff
 4. **Pipeline Orchestration**: ThreadPoolExecutor for parallel SVD
 5. **Short-Lived Connections**: DuckDB connections in try/finally blocks
 6. **Graceful Degradation**: try/except around optional dependencies
 ### Domain Invariants
 ⚠️ **CRITICAL RULES** (from AGENTS.md):
 1. **Right-wing parties on RIGHT**: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations
 2. **SVD labels = voting patterns**: SVD labels reflect voting patterns, NOT semantic content
 ### Database Tables
 | Table | Purpose |
 |-------|---------|
 | `motions` | Parliamentary motions with id, title, date, category |
 | `mp_votes` | Individual MP votes on motions (Voor/Tegen/Onthouden) |
 | `mp_metadata` | MP names, parties, tenure info |
 | `svd_vectors` | 2D SVD-computed political positions per entity |
 | `fused_embeddings` | Combined SVD + text embeddings |
 | `embeddings` | Text embeddings for motions |
 | `user_sessions` | Voting session tracking |
 | `party_results` | Party match results per session |
 ### Conventions
 - **Error Handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
 - **Logging**: Use `logging.getLogger(__name__)` — **never use print()**
 - **Imports**: stdlib → 3rd party → local (3 groups)
 - **Type Hints**: Required on public functions with typing module imports
 - **DuckDB**: Short-lived connections with try/finally conn.close()
--- a/analysis/explorer_data.py
+++ b/analysis/explorer_data.py
@ -346,7 +346,12 @@ def load_party_mp_vectors(db_path: str) -> Dict[str, List[np.ndarray]]:
 def load_scree_data(db_path: str) -> List[float]:
-    """Load scree plot data (explained variance) for current_parliament."""
+    """Load scree plot data (explained variance) for current_parliament.
    First tries to read the cached metadata row from svd_vectors.
    Falls back to on-the-fly computation via compute_svd_spectrum for
    backward compatibility with databases that haven't stored it yet.
    """
    try:
        con = duckdb.connect(database=db_path, read_only=True)
        row = con.execute(
@ -364,7 +369,11 @@ def load_scree_data(db_path: str) -> List[float]:
            import json
            return json.loads(row[0])
-        return []
+
        # Fallback: compute dynamically for backward compatibility
        from analysis.political_axis import compute_svd_spectrum
        return compute_svd_spectrum(db_path)
    except Exception:
        logger.exception("Failed to load scree data")
        return []
--- a/scripts/mindmodel/checks.py
+++ b/scripts/mindmodel/checks.py
@ -1,72 +0,0 @@
 import os
 import re
 from typing import List
 def file_exists(base_dir: str, path: str) -> bool:
    """Check whether a path exists under base_dir without opening the file.
    This resolves the path relative to base_dir and returns True if the
    resolved path exists on the filesystem (file or directory).
    """
    if not base_dir:
        base = ""
    else:
        base = base_dir
    full = os.path.join(base, path)
    return os.path.exists(full)
 def detect_truncated(snippet: str) -> bool:
    """Heuristic detection whether a snippet is truncated.
    Returns True if the snippet ends with an ellipsis '...' (after
    trimming whitespace) or contains a common truncation marker like
    the substring 'truncat' (case-insensitive).
    """
    if snippet is None:
        return False
    s = snippet.strip()
    if s.endswith("..."):
        return True
    if "truncat" in s.lower():
        return True
    return False
 def find_potential_secrets(text: str) -> List[str]:
    """Scan the provided text and return a list of potential secret-like
    strings. This uses a few common heuristics and regex patterns and only
    scans the provided text (no external resources).
    The function returns a list of found token strings (values when
    capture groups are available, otherwise the matched substring).
    """
    if not text:
        return []
    candidates: List[str] = []
    # AWS access key id pattern (common): AKIA followed by 16 alphanumeric
    aws_pattern = re.compile(r"AKIA[0-9A-Z]{16}")
    candidates.extend(aws_pattern.findall(text))
    # Common key/value patterns like api_key = "..." or "api-key: ..."
    # allow shorter secret values (down to 4 chars) to catch short test values
    kv_pattern = re.compile(
        r"(?i)(?:api[_-]?key|secret[_-]?key|access[_-]?token|access[_-]?key|token|password|passwd|pwd)\s*[=:]+\s*['\"]?([A-Za-z0-9\-_=+/\.]{4,128})['\"]?"
    )
    candidates.extend(m.group(1) for m in kv_pattern.finditer(text))
    # Generic long hex or base64-like strings (heuristic)
    long_hex = re.compile(r"\b([a-f0-9]{32,128})\b", re.IGNORECASE)
    candidates.extend(long_hex.findall(text))
    # Deduplicate while preserving order
    seen = set()
    result: List[str] = []
    for c in candidates:
        if c and c not in seen:
            seen.add(c)
            result.append(c)
    return result
--- a/scripts/mindmodel/cli.py
+++ b/scripts/mindmodel/cli.py
@ -1,32 +0,0 @@
 from typing import List, Optional
 def main(argv: Optional[List[str]] = None) -> int:
    """CLI wrapper that delegates to scripts.mindmodel.validator.main.
    Returns the integer exit code from the delegated main. If the
    validator module is not available or raises, return a non-zero
    exit code.
    """
    try:
        # Import here to avoid side-effects on module import
        from scripts.mindmodel import validator
        # Call the validator.main if present
        if hasattr(validator, "main"):
            result = validator.main(argv)
            # Ensure we return an int
            try:
                return int(result)  # type: ignore
            except Exception:
                return 1
        else:
            return 2
    except Exception:
        # Import error or runtime error — return non-zero so callers
        # can detect failure (tests expect non-zero on missing manifest)
        return 2
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/scripts/mindmodel/loader.py
+++ b/scripts/mindmodel/loader.py
@ -1,67 +0,0 @@
 """Simple manifest loader for mindmodel manifests.
 Provides `load_manifest(path: str) -> dict` and `ManifestLoadError`.
 Behavior:
 - If PyYAML is installed, uses yaml.safe_load to parse the file.
 - Otherwise falls back to the stdlib json parser.
 - If the top-level document is a list it will be normalized to {"constraints": <list>}.
 - Raises ManifestLoadError for missing file or parse errors.
 """
 from typing import Any, Dict
 import json
 from pathlib import Path
 class ManifestLoadError(Exception):
    """Raised when a manifest cannot be loaded or parsed."""
 try:
    import yaml  # type: ignore
 except Exception:  # YAML not available
    yaml = None  # type: ignore
 def _parse_with_yaml(text: str) -> Any:
    # yamlsafe_load may return any Python structure
    try:
        return yaml.safe_load(text)
    except Exception as exc:  # pragma: no cover - defensive
        raise ManifestLoadError(f"YAML parse error: {exc}") from exc
 def _parse_with_json(text: str) -> Any:
    try:
        return json.loads(text)
    except Exception as exc:
        raise ManifestLoadError(f"JSON parse error: {exc}") from exc
 def load_manifest(path: str) -> Dict[str, Any]:
    """Load a manifest from the given file path and normalize it to a dict.
    If the top-level document is a list, it will be returned as {"constraints": list}.
    Raises ManifestLoadError if the file does not exist or if parsing fails.
    """
    p = Path(path)
    if not p.exists():
        raise ManifestLoadError(f"Manifest file not found: {path}")
    text = p.read_text(encoding="utf-8")
    if yaml is not None:
        data = _parse_with_yaml(text)
    else:
        data = _parse_with_json(text)
    # Normalize
    if isinstance(data, list):
        return {"constraints": data}
    if isinstance(data, dict):
        return data
    # Unexpected top-level type, wrap it
    return {"manifest": data}
--- a/scripts/mindmodel/validator.py
+++ b/scripts/mindmodel/validator.py
@ -1,108 +0,0 @@
 from typing import Dict, Tuple, List, Any
 import json
 from pathlib import Path
 from scripts.mindmodel import loader
 from scripts.mindmodel import checks
 def validate_manifest(path: str, base_dir: str = None) -> Tuple[int, Dict[str, Any]]:
    """Validate a manifest file at `path`.
    Returns a tuple (exit_code, report).
    exit codes:
      0 - ok (no issues)
      1 - warnings (only truncated snippets found)
      2 - critical (missing files, secrets, or parse error)
    """
    report: Dict[str, Any] = {
        "path": path,
        "secrets": [],
        "missing_files": [],
        "truncated": 0,
        "constraints": [],
    }
    p = Path(path)
    try:
        raw_text = p.read_text(encoding="utf-8")
    except Exception as exc:
        report["load_error"] = f"Manifest file not readable: {exc}"
        return 2, report
    # scan for secrets in the manifest text
    secrets = checks.find_potential_secrets(raw_text)
    report["secrets"] = secrets
    try:
        manifest = loader.load_manifest(path)
    except loader.ManifestLoadError as exc:
        report["load_error"] = str(exc)
        # treat parse/load errors as critical
        return 2, report
    constraints = manifest.get("constraints") or []
    for constraint in constraints:
        c_rep: Dict[str, Any] = {"constraint": constraint, "evidence": []}
        for ev in (
            constraint.get("evidence", [])
            if isinstance(constraint.get("evidence", []), list)
            else []
        ):
            text = ev.get("text") if isinstance(ev, dict) else None
            file_ref = ev.get("file") if isinstance(ev, dict) else None
            exists = True
            if file_ref:
                if not checks.file_exists(base_dir or "", file_ref):
                    exists = False
                    report["missing_files"].append(file_ref)
            truncated = False
            if text:
                truncated = checks.detect_truncated(text)
                if truncated:
                    report["truncated"] += 1
            c_rep["evidence"].append(
                {
                    "text": text,
                    "file": file_ref,
                    "exists": exists,
                    "truncated": truncated,
                }
            )
        report["constraints"].append(c_rep)
    # decide exit code
    if report["secrets"]:
        return 2, report
    if report["missing_files"]:
        return 2, report
    if report["truncated"] > 0:
        return 1, report
    return 0, report
 def main(argv: List[str]) -> int:
    import sys
    if len(argv) < 2:
        print(json.dumps({"error": "manifest path required"}))
        return 2
    path = argv[1]
    base_dir = argv[2] if len(argv) > 2 else None
    code, report = validate_manifest(path, base_dir=base_dir)
    print(json.dumps(report))
    return code
 # no execution at import time
--- a/scripts/validate_mindmodel.py
+++ b/scripts/validate_mindmodel.py
@ -1,56 +0,0 @@
 """Command-line wrapper around src.validators.mindmodel_validator.validate_manifest
 This tiny CLI loads a manifest and writes a structured JSON report to stdout
 and optionally to a file path. It is report-only: it never raises an error or
 changes exit code based on findings.
 """
 from __future__ import annotations
 import argparse
 import json
 import os
 from pathlib import Path
 from typing import Any
 def _write_report(report: dict[str, Any], path: Path | None) -> None:
    text = json.dumps(report, indent=2, ensure_ascii=False)
    print(text)
    if path:
        path.parent.mkdir(parents=True, exist_ok=True)
        path.write_text(text, encoding="utf-8")
 def main(argv: list[str] | None = None) -> int:
    parser = argparse.ArgumentParser("validate_mindmodel")
    parser.add_argument("manifest", nargs="?", help="path to manifest file")
    parser.add_argument("--manifest", dest="manifest_opt", help="path to manifest file")
    parser.add_argument("--report", help="optional output report path")
    args = parser.parse_args(argv)
    manifest = args.manifest_opt or args.manifest
    if not manifest:
        parser.error("manifest path is required (positional or --manifest)")
    # import here to keep CLI tiny when unused
    try:
        from src.validators.mindmodel_validator import validate_manifest
    except Exception as e:  # pragma: no cover - defensive
        print(f"Failed to import validator: {e}")
        return 0
    try:
        report = validate_manifest(manifest, report_only=True)
    except Exception as e:  # never fail the process
        report = {"error": str(e)}
    report_path = Path(args.report) if args.report else None
    _write_report(report, report_path)
    # always exit zero for report-only operation
    return 0
 if __name__ == "__main__":
    raise SystemExit(main())
--- a/src/types/motion_types.py
+++ b/src/types/motion_types.py
@ -1,35 +0,0 @@
 """Motion-related simple types and JSON helpers.
 Decision: MotionId is an alias for str for simplicity.
 """
 from dataclasses import dataclass, asdict
 from typing import List
 import json
 MotionId = str
 Embedding = List[float]
@dataclass
 class SimilarityNeighbor:
    motion_id: MotionId
    score: float
 def to_json(neighbors: List[SimilarityNeighbor]) -> str:
    """Serialize a list of SimilarityNeighbor to a JSON string.
    The format is a JSON list of objects with keys 'motion_id' and 'score'.
    """
    list_of_dicts = [asdict(n) for n in neighbors]
    return json.dumps(list_of_dicts)
 def from_json(json_str: str) -> List[SimilarityNeighbor]:
    """Deserialize a JSON string (list of dicts) into SimilarityNeighbor list."""
    parsed = json.loads(json_str)
    return [
        SimilarityNeighbor(motion_id=item["motion_id"], score=float(item["score"]))
        for item in parsed
    ]
--- a/src/validators/mindmodel_validator.py
+++ b/src/validators/mindmodel_validator.py
@ -1,142 +0,0 @@
 """Conservative, report-only mindmodel/manifest validator.
 This module provides a small validator that reads a manifest (YAML if
 PyYAML is available, otherwise a tiny fallback parser) and reports
 potential issues without making changes.
 The returned report contains the keys:
 - missing_files: list of file paths referenced in the manifest that don't exist
 - truncated_evidence: list of items (dicts) where evidence_excerpt appears truncated
 - potential_secrets: list of items (dicts) where evidence_excerpt looks like it may contain secrets
 The manifest is expected to contain a top-level `files` list with
 entries that are mappings and have at least a `path` (or `file_path`)
 and optionally `evidence_excerpt`.
 """
 from __future__ import annotations
 import os
 from typing import List, Dict, Any
 def _load_yaml_native(path: str) -> Dict[str, Any]:
    try:
        import yaml  # type: ignore
        with open(path, "r", encoding="utf-8") as f:
            return yaml.safe_load(f) or {}
    except Exception:
        raise
 def _load_yaml_fallback(path: str) -> Dict[str, Any]:
    """Tiny YAML-ish fallback parser that understands a minimal manifest.
    It only supports a top-level `files:` key and a sequence of simple
    mappings with `-` list items and `key: value` pairs indented.
    This is intentionally conservative and fragile; it's only used when
    PyYAML is not available.
    """
    result: Dict[str, Any] = {}
    files: List[Dict[str, Any]] = []
    current: Dict[str, Any] | None = None
    with open(path, "r", encoding="utf-8") as f:
        for raw in f:
            line = raw.rstrip("\n")
            stripped = line.lstrip()
            if not stripped or stripped.startswith("#"):
                continue
            if stripped.startswith("files:") and line.startswith(stripped):
                # top-level marker, skip
                continue
            if stripped.startswith("- "):
                # start new item
                if current is not None:
                    files.append(current)
                current = {}
                # possible inline key: - path: something
                rest = stripped[2:].strip()
                if rest:
                    if ":" in rest:
                        k, v = rest.split(":", 1)
                        current[k.strip()] = v.strip()
                continue
            # key: value lines (indented)
            if ":" in stripped and current is not None:
                k, v = stripped.split(":", 1)
                current[k.strip()] = v.strip()
    if current is not None:
        files.append(current)
    if files:
        result["files"] = files
    return result
 def _normalize_entry(entry: Any) -> Dict[str, Any]:
    if not isinstance(entry, dict):
        return {"path": str(entry)}
    # prefer path or file_path
    if "file_path" in entry and "path" not in entry:
        entry = dict(entry)
        entry["path"] = entry.pop("file_path")
    return entry
 def validate_manifest(manifest_path: str, report_only: bool = True) -> dict:
    """Validate a minimal mindmodel manifest and return a report.
    Parameters
    - manifest_path: path to the YAML manifest file
    - report_only: unused flag for now; kept to emphasise this is report-only
    Returns a dict with keys: missing_files, truncated_evidence, potential_secrets
    """
    if not os.path.exists(manifest_path):
        raise FileNotFoundError(manifest_path)
    # attempt to use PyYAML if available, otherwise fallback
    try:
        manifest = _load_yaml_native(manifest_path)
    except Exception:
        manifest = _load_yaml_fallback(manifest_path)
    files = manifest.get("files") or []
    report = {"missing_files": [], "truncated_evidence": [], "potential_secrets": []}
    def _strip_surrounding_quotes(s: str) -> str:
        s = s.strip()
        if len(s) >= 2 and s[0] == s[-1] and s[0] in ('"', "'"):
            return s[1:-1]
        return s
    for raw in files:
        entry = _normalize_entry(raw)
        path = entry.get("path")
        evidence = entry.get("evidence_excerpt") or entry.get("evidence") or ""
        # Remove surrounding quotes if the fallback YAML parser left them in place
        if isinstance(evidence, str):
            evidence = _strip_surrounding_quotes(evidence)
        # missing files
        if path:
            if not os.path.exists(path):
                report["missing_files"].append(path)
        # truncated evidence heuristics
        if isinstance(evidence, str):
            if len(evidence) > 1000 or evidence.strip().endswith("..."):
                report["truncated_evidence"].append(
                    {"path": path, "evidence_excerpt": evidence}
                )
            # potential secrets heuristics
            up = evidence.upper()
            if "PASSWORD" in up or "SECRET" in up or "BEGIN PRIVATE KEY" in evidence:
                report["potential_secrets"].append(
                    {"path": path, "evidence_excerpt": evidence}
                )
    return report
--- a/tests/ci/test_schedule_exists.py
+++ b/tests/ci/test_schedule_exists.py
@ -1,11 +0,0 @@
 import pathlib
 def test_schedule_workflow_exists():
    path = pathlib.Path(".github/workflows/mindmodel-schedule.yml")
    assert path.exists(), f"Expected {path} to exist"
    text = path.read_text(encoding="utf-8")
    # ensure the file is a GitHub Actions workflow that declares a schedule
    assert "on:" in text
    assert "schedule" in text
--- a/tests/ci/test_workflow_exists.py
+++ b/tests/ci/test_workflow_exists.py
@ -1,26 +0,0 @@
 import os
 try:
    import yaml
    _HAS_YAML = True
 except Exception:
    _HAS_YAML = False
 def test_mindmodel_workflow_exists_and_parses():
    path = os.path.join(".github", "workflows", "mindmodel-validation.yml")
    assert os.path.exists(path), f"Workflow file {path} does not exist"
    # Minimal parse: if PyYAML is available, try safe_load; otherwise do a token check
    with open(path, "r", encoding="utf-8") as f:
        content = f.read()
    if _HAS_YAML:
        data = yaml.safe_load(content)
        assert data is not None and isinstance(data, dict)
        assert "on" in data or "name" in data
    else:
        # fall back to simple checks to avoid introducing new deps
        assert "name:" in content
        assert "on:" in content
--- a/tests/scripts/mindmodel/test_checks.py
+++ b/tests/scripts/mindmodel/test_checks.py
@ -1,43 +0,0 @@
 import os
 import tempfile
 from scripts.mindmodel import checks
 def test_file_exists(tmp_path):
    # create a file under tmp_path
    base = str(tmp_path)
    p = tmp_path / "subdir"
    p.mkdir()
    f = p / "file.txt"
    f.write_text("hello")
    # path relative to base
    assert checks.file_exists(base, "subdir/file.txt")
    # non-existing
    assert not checks.file_exists(base, "subdir/missing.txt")
 def test_detect_truncated():
    assert checks.detect_truncated("This is a truncated snippet...")
    assert checks.detect_truncated("Truncation marker: [truncated]")
    assert checks.detect_truncated("contains truncatED word")
    assert not checks.detect_truncated("This is complete")
    assert not checks.detect_truncated("")
 def test_find_potential_secrets():
    text = """
    api_key = "abcdEFGH1234ijklMNOP"
    password: 'hunter2'
    aws = AKIA1234567890ABCD12
    random_hex = deadbeefdeadbeefdeadbeefdeadbeef
    not_a_secret = short
    """
    found = checks.find_potential_secrets(text)
    # should find api_key value, password, aws and long hex
    assert "abcdEFGH1234ijklMNOP" in found
    assert "hunter2" in found
    assert any(item.startswith("AKIA") for item in found)
    assert any("deadbeef" in item for item in found)
--- a/tests/scripts/mindmodel/test_cli.py
+++ b/tests/scripts/mindmodel/test_cli.py
@ -1,14 +0,0 @@
 import os
 def test_cli_with_nonexistent_manifest():
    """Calling cli.main with a non-existent manifest should return non-zero."""
    from scripts.mindmodel import cli
    # Provide a path that is extremely unlikely to exist
    fake_manifest = "/this/path/does/not/exist/manifest.json"
    code = cli.main([fake_manifest])
    assert isinstance(code, int)
    assert code != 0
--- a/tests/scripts/mindmodel/test_loader.py
+++ b/tests/scripts/mindmodel/test_loader.py
@ -1,21 +0,0 @@
 import json
 import pytest
 from scripts.mindmodel import loader
 def test_load_json_manifest(tmp_path):
    data = [{"id": "c1", "description": "a constraint"}]
    p = tmp_path / "manifest.json"
    p.write_text(json.dumps(data), encoding="utf-8")
    loaded = loader.load_manifest(str(p))
    assert isinstance(loaded, dict)
    assert "constraints" in loaded
    assert any(c.get("id") == "c1" for c in loaded["constraints"])
 def test_missing_manifest_raises():
    with pytest.raises(loader.ManifestLoadError):
        loader.load_manifest("nonexistent-file-manifest.json")
--- a/tests/scripts/mindmodel/test_validator.py
+++ b/tests/scripts/mindmodel/test_validator.py
@ -1,70 +0,0 @@
 import json
 import os
 from scripts.mindmodel import validator
 def write_manifest(path, data: str):
    p = path
    p.write_text(data, encoding="utf-8")
    return str(p)
 def test_validate_ok(tmp_path):
    # manifest with one constraint and evidence pointing to an existing file
    evidence_file = tmp_path / "file.txt"
    evidence_file.write_text("hello")
    manifest = {
        "constraints": [
            {"id": "c1", "evidence": [{"file": "file.txt", "text": "complete content"}]}
        ]
    }
    manifest_path = tmp_path / "manifest.json"
    manifest_path.write_text(json.dumps(manifest))
    code, report = validator.validate_manifest(
        str(manifest_path), base_dir=str(tmp_path)
    )
    assert code == 0
    assert report["missing_files"] == []
    assert report["secrets"] == []
 def test_missing_file_flags_failure(tmp_path):
    # manifest refers to missing file
    manifest = {
        "constraints": [{"id": "c2", "evidence": [{"file": "nope.txt", "text": "foo"}]}]
    }
    manifest_path = tmp_path / "manifest.json"
    manifest_path.write_text(json.dumps(manifest))
    code, report = validator.validate_manifest(
        str(manifest_path), base_dir=str(tmp_path)
    )
    assert code == 2
    assert "nope.txt" in report["missing_files"]
 def test_truncated_produces_warning(tmp_path):
    # evidence text is truncated -> warning
    f = tmp_path / "manifest.json"
    manifest = {
        "constraints": [{"id": "c3", "evidence": [{"text": "This is truncated..."}]}]
    }
    f.write_text(json.dumps(manifest))
    code, report = validator.validate_manifest(str(f), base_dir=str(tmp_path))
    assert code == 1
    assert report["truncated"] >= 1
 def test_manifest_scanned_for_secrets(tmp_path):
    # manifest text contains an api_key pattern
    f = tmp_path / "manifest.json"
    f.write_text('api_key = "secretVALUE1234"')
    code, report = validator.validate_manifest(str(f), base_dir=str(tmp_path))
    assert code == 2
    assert any("secretVALUE1234" in s for s in report["secrets"]) or report["secrets"]
--- a/tests/scripts/test_validate_cli.py
+++ b/tests/scripts/test_validate_cli.py
@ -1,52 +0,0 @@
 import json
 import subprocess
 import sys
 from pathlib import Path
 def test_cli_runs(tmp_path):
    manifest = Path(".mindmodel/manifest.yaml")
    assert manifest.exists(), "expected .mindmodel/manifest.yaml to exist in repo"
    report_path = tmp_path / "report.json"
    # Try module mode first, fallback to direct script invocation
    cmds = [
        [
            sys.executable,
            "-m",
            "scripts.validate_mindmodel",
            str(manifest),
            "--report",
            str(report_path),
        ],
        [
            sys.executable,
            "scripts/validate_mindmodel.py",
            str(manifest),
            "--report",
            str(report_path),
        ],
    ]
    result = None
    for cmd in cmds:
        try:
            result = subprocess.run(cmd, check=False, capture_output=True, text=True)
            # if process ran (any exit code), break and use this result
            break
        except FileNotFoundError:
            continue
    assert result is not None, "Failed to run script (no suitable invocation)"
    # CLI should exit with 0 (report-only)
    assert result.returncode == 0, (
        f"CLI exited non-zero: {result.returncode}\nstderr: {result.stderr}"
    )
    assert report_path.exists(), f"Report file was not created at {report_path}"
    data = json.loads(report_path.read_text(encoding="utf-8"))
    # top-level keys expected from validator
    for key in ("missing_files", "truncated_evidence", "potential_secrets"):
        assert key in data, f"Report JSON missing key: {key}"
--- a/tests/types/test_motion_types.py
+++ b/tests/types/test_motion_types.py
@ -1,22 +0,0 @@
 import json
 from src.types.motion_types import SimilarityNeighbor, to_json, from_json
 def test_similarity_neighbor_json_roundtrip():
    neighbors = [
        SimilarityNeighbor(motion_id="m1", score=0.9),
        SimilarityNeighbor(motion_id="m2", score=0.75),
    ]
    # Serialize to JSON string
    json_str = to_json(neighbors)
    assert isinstance(json_str, str)
    # Ensure it's valid JSON
    parsed = json.loads(json_str)
    assert isinstance(parsed, list)
    # Deserialize back to objects
    recovered = from_json(json_str)
    assert recovered == neighbors
--- a/tests/validators/test_mindmodel_validator.py
+++ b/tests/validators/test_mindmodel_validator.py
@ -1,45 +0,0 @@
 import os
 import tempfile
 from pathlib import Path
 import pytest
 from src.validators.mindmodel_validator import validate_manifest
 def _write_temp_manifest(contents: str) -> str:
    fd, path = tempfile.mkstemp(prefix="manifest_", suffix=".yaml")
    os.close(fd)
    with open(path, "w", encoding="utf-8") as f:
        f.write(contents)
    return path
 def test_validator_reports_missing_file(tmp_path):
    # manifest referencing a non-existent file
    missing = str(tmp_path / "no_such_file.txt")
    manifest = f"""
 files:
  - path: {missing}
 """
    mpath = _write_temp_manifest(manifest)
    try:
        report = validate_manifest(mpath)
        assert "missing_files" in report
        assert missing in report["missing_files"]
    finally:
        Path(mpath).unlink()
 def test_validator_detects_potential_secret(tmp_path):
    # manifest with evidence_excerpt containing PASSWORD
    evidence = "This shows a PASSWORD=hunter2 in the output"
    manifest = f'files:\n  - path: some_file.txt\n    evidence_excerpt: "{evidence}"\n'
    mpath = _write_temp_manifest(manifest)
    try:
        report = validate_manifest(mpath)
        assert "potential_secrets" in report
        items = report["potential_secrets"]
        assert any(evidence in (item.get("evidence_excerpt") or "") for item in items)
    finally:
        Path(mpath).unlink()
--- a/tests/validators/test_types.py
+++ b/tests/validators/test_types.py
@ -1,24 +0,0 @@
 import os
 from pathlib import Path
 import pytest
 from src.validators.types import parse_manifest, Manifest
 def test_manifest_model_parses_sample(tmp_path: Path):
    sample = """
 files:
  - path: data/file1.txt
    evidence_excerpt: "some evidence"
  - file_path: data/file2.txt
    evidence_excerpt: "other evidence"
 """
    p = tmp_path / "manifest.yaml"
    p.write_text(sample, encoding="utf-8")
    manifest = parse_manifest(str(p))
    assert isinstance(manifest, Manifest)
    assert len(manifest.files) == 2
    assert manifest.files[0]["path"] == "data/file1.txt"
    assert manifest.files[1]["path"] == "data/file2.txt"
--- a/tests/validators/test_validator_edgecases.py
+++ b/tests/validators/test_validator_edgecases.py
@ -1,56 +0,0 @@
 import os
 from pathlib import Path
 from src.validators.mindmodel_validator import validate_manifest
 def test_missing_files_reported(tmp_path):
    # create two paths that do not exist
    p1 = str(tmp_path / "missing_one.txt")
    p2 = str(tmp_path / "missing_two.txt")
    manifest = f"""
 files:
  - path: {p1}
  - path: {p2}
 """
    mpath = tmp_path / "manifest_missing.yaml"
    mpath.write_text(manifest, encoding="utf-8")
    report = validate_manifest(str(mpath))
    assert "missing_files" in report
    # both missing paths should be reported
    assert p1 in report["missing_files"]
    assert p2 in report["missing_files"]
 def test_truncated_evidence_and_secrets_reported(tmp_path):
    # entry with truncated evidence (ends with ...)
    trunc_path = str(tmp_path / "trunc.txt")
    trunc_evidence = "This output was cut off..."
    # entry with potential secret (contains PASSWORD)
    secret_path = str(tmp_path / "secret.txt")
    secret_evidence = "Found PASSWORD=sekret123 in the logs"
    manifest = f"""
 files:
  - path: {trunc_path}
    evidence_excerpt: "{trunc_evidence}"
  - path: {secret_path}
    evidence_excerpt: "{secret_evidence}"
 """
    mpath = tmp_path / "manifest_edgecases.yaml"
    mpath.write_text(manifest, encoding="utf-8")
    report = validate_manifest(str(mpath))
    # truncated evidence should report the trunc_path
    assert "truncated_evidence" in report
    assert any(item.get("path") == trunc_path for item in report["truncated_evidence"])
    # potential secrets should report the secret_path
    assert "potential_secrets" in report
    assert any(item.get("path") == secret_path for item in report["potential_secrets"])
--- a/thoughts/shared/changes/2026-03-28-ansible-package-implementation.md
+++ b/thoughts/shared/changes/2026-03-28-ansible-package-implementation.md
@ -1,40 +0,0 @@
 # 2026-03-28 Ansible package implementation
 Summary of changes added to repository:
 - packages/@ansible/example/
  - package.json (scoped package @ansible/example)
  - README.md
  - src/index.js
  - tests/ (test_package_json.js, test_pack_inspect.js, _pack_helpers.js, run.js)
 - .github/workflows/publish-ansible-example.yml
 - .github/workflows/deploy-motief.yml
 - docs/deployment/ansible-package-deploy.md
 - docs/embeddings.md
 - README.md (top-level)
 - thoughts/shared/changes/2026-03-28-ansible-package-implementation.md (this file)
 Verification commands (run from repo root):
 1. Run package tests:
   cd packages/@ansible/example && npm test
 2. Run pack inspection:
   cd packages/@ansible/example && node tests/test_pack_inspect.js
 3. Simulate pack locally:
   cd packages/@ansible/example && npm pack && tar -tzf <produced-tgz> | head -n 20
 4. Check workflows syntax locally (optional):
   - Use `act` or `nektos/act` to run workflow_dispatch triggers in a container; ensure secrets are not printed.
 5. Verify docs updated for embeddings and deployment: open docs/embeddings.md and docs/deployment/ansible-package-deploy.md
 Notes:
 - Do NOT add secrets to repo. Secrets: NPM_TOKEN, DEPLOY_SSH_KEY, DEPLOY_HOST, DEPLOY_USER, DEPLOY_SSH_PORT, OPENROUTER_API_KEY
 Contact: Sven Geboers
 End of changelog.
 Write the file with neutral tone and concise steps for verification.
--- a/thoughts/shared/changes/2026-03-28-env-removal-report.md
+++ b/thoughts/shared/changes/2026-03-28-env-removal-report.md
@ -1,36 +0,0 @@
 ---
 date: 2026-03-28
 title: "Remove .env from tracking — report"
 ---
 Summary
 -------
 I removed `.env` from the repository index and added it to `.gitignore` to prevent accidental future commits. This was a non-destructive, forward-facing change — the repository history still contains prior commits that touched `.env`.
 What I ran
 -----------
 - git rm --cached .env
 - ensured `.gitignore` contains `.env`
 - committed the change: chore(secrets): stop tracking .env and add to .gitignore
 Commits that referenced .env
 ----------------------------
 These commits touched `.env` in the repository history (from git log --all -- .env):
 - 35f4667 2026-03-28 Sven Geboers chore(secrets): stop tracking .env and add to .gitignore
 - 3551a82 2026-03-21 Sven Geboers feat(analysis): add 2D political compass and 2D trajectories
 Notes
 -----
 - The `.env` file was removed from the index but remains in historical commits. If you need to remove it from history, we can perform a history rewrite (git-filter-repo or BFG) and force-push; this is destructive and requires coordination.
 - I created a CI guard to fail builds if a `.env` file is present in the repository root (see .github/workflows/forbid-env.yml). This prevents accidental re-adding via pushes/PRs.
 Next steps (recommended)
 ------------------------
 1. Rotate secrets that might have been in `.env` (see the secrets-rotation checklist next). This is mandatory if those keys were used anywhere publicly or in shared CI.
 2. If you require history purge, reply confirming and I'll prepare a filter-repo run and the exact force-push sequence.
--- a/thoughts/shared/changes/2026-03-28-secrets-rotation-checklist.md
+++ b/thoughts/shared/changes/2026-03-28-secrets-rotation-checklist.md
@ -1,25 +0,0 @@
 ---
 date: 2026-03-28
 title: "Secrets rotation checklist"
 ---
 Rotate these secrets if they were stored in `.env` or otherwise exposed:
 - OPENROUTER_API_KEY / OPENAI_API_KEY
 - NPM_TOKEN
 - DEPLOY SSH keys or passwords (DEPLOY_SSH_KEY, DEPLOY_PASSWORD)
 - Any database credentials, API keys, or third-party service tokens
 Steps
 -----
 1. Revoke the current tokens in each provider's dashboard.
 2. Create new tokens/keys and store them in the repository secrets (GitHub Settings → Secrets).
 3. Update any running services / CI variables to use the new tokens.
 4. If you used SSH keys and replaced them, update the authorized_keys on the VPS and remove the old key.
 Verification
 ------------
 - Use CI dry-run jobs that check connectivity and token validity.
 - Run local commands that use the new tokens.