refactor: delete stale files and consolidate .mindmodel structure

Deleted stale root-level Python files:
- main.py (unused 'Hello world' script)
- verify.py (unused table info script)
- scraper.py (unused MotionScraper class)
- scheduler.py (unused DataUpdateScheduler class)

Deleted duplicate .mindmodel root YAML files (subdirectory versions are more comprehensive):
- anti-patterns.yaml, architecture.yaml, conventions.yaml
- dependencies.yaml, domain.yaml, domain-glossary.yaml
- stack.yaml, tech-stack.yaml, workflows.yaml

Added comprehensive .mindmodel subdirectories:
- constraints/ (naming, db-schema, error-handling, types, etc.)
- patterns/ (api, architecture, database, python, streamlit, etc.)
- examples/ (code examples for each pattern)
- anti-patterns/, architecture/, conventions/, dependencies/, domain/, stack/

Updated ARCHITECTURE.md to reflect current codebase:
- Removed references to non-existent files
- Added missing files (explorer.py, explorer_helpers.py, pipeline/)
- Added directory structure documentation
- Updated tech stack to include scipy, sklearn, umap

Updated .gitignore:
- Added patterns for generated analysis files
- Added .worktrees/ pattern (was already in gitignore but dir was deleted)

Removed empty .worktrees/ directory
main
Sven Geboers 4 weeks ago
parent 0308d20f12
commit f376300804
  1. 146
      .mindmodel/anti-patterns/anti-patterns.yaml
  2. 55
      .mindmodel/architecture/architecture.yaml
  3. 54
      .mindmodel/constraints/README.md
  4. 184
      .mindmodel/constraints/error-handling.yaml
  5. 207
      .mindmodel/constraints/imports.yaml
  6. 167
      .mindmodel/constraints/logging.yaml
  7. 171
      .mindmodel/constraints/naming.yaml
  8. 233
      .mindmodel/constraints/types.yaml
  9. 124
      .mindmodel/conventions/conventions.yaml
  10. 78
      .mindmodel/dependencies/dependencies.yaml
  11. 107
      .mindmodel/domain/domain-glossary.yaml
  12. 196
      .mindmodel/examples/api-client-example.py
  13. 191
      .mindmodel/examples/database-example.py
  14. 217
      .mindmodel/examples/pipeline-example.py
  15. 316
      .mindmodel/examples/streamlit-page-example.py
  16. 265
      .mindmodel/patterns/api.yaml
  17. 230
      .mindmodel/patterns/architecture.yaml
  18. 239
      .mindmodel/patterns/database.yaml
  19. 228
      .mindmodel/patterns/patterns.yaml
  20. 196
      .mindmodel/patterns/python.yaml
  21. 225
      .mindmodel/patterns/streamlit.yaml
  22. 41
      .mindmodel/stack/stack.yaml

@ -0,0 +1,146 @@
# Anti-Patterns
> ⚠ **NOTE**: Section 1 below was **investigated and resolved** — it is NOT a bug (see §1 for details).
---
## 1. ~~CRITICAL: Entity-ID / Party-Name Mismatch in `compute_party_coords`~~ → **INVALID — INVESTIGATED & RESOLVED**
**Investigation Date**: 2026-03-31
**Investigation Summary**: After thorough analysis of the database schema and code, this anti-pattern is **INVALID**. The original concern was based on a false assumption about `svd_vectors.entity_id` containing party names.
**Investigation Findings**:
1. **`svd_vectors` table has NO rows with `entity_type='party'`** — only `mp` and `motion` entity types exist in practice.
2. **`entity_ids in svd_vectors are always MP names** (e.g., `"Van Dijk, I."`), never party names. The party centroids are correctly computed via `mp_metadata` lookups.
3. **The trajectories plot WORKS correctly** — no production bug exists. The code path for party-level visualization does not rely on `svd_vectors.entity_id` containing party names.
**Conclusion**: The original anti-pattern was a false positive caused by incorrect assumptions about data contents. The `party_map` reverse-lookup (`mp_name → party_name`) works correctly because `entity_id` values are always MP names, not party names.
---
## 2. Bare `except: pass`
**File**: `database.py`, line 47
**Problem**: Catches **all** exceptions including `KeyboardInterrupt`, `SystemExit`, `MemoryError`.
Silently swallows errors — no logging, no fallback.
**Broken code**:
```python
try:
self.conn.execute(sql)
except: # ← bare except
pass
```
**Fix**:
```python
try:
self.conn.execute(sql)
except ibis.errors.IbisError as e:
st.warning(f"Query failed: {e}")
raise # or return a default
```
---
## 3. Nested Exception Handling
**File**: `explorer.py`, lines 244–261
**Problem**: Try/except inside try/except creates opaque error paths. Inner exception silently swallows outer intent.
**Broken code**:
```python
try:
result = compute_svd(motions)
# ...
except Exception:
try:
# Try fallback approach
result = fallback_compute(motions)
except Exception:
pass # ← both exceptions silently dropped
```
**Fix**: Flatten — handle each case explicitly, or use a decorator.
---
## 4. Catch-All `Exception` Used Everywhere
**Problem**: `except Exception:` catches 50+ exception types including `ValueError`, `TypeError`, `KeyError`.
Overly broad — masks real bugs.
**Occurrence**: 850+ instances of bare/generic exception handlers across codebase.
**Fix**: Catch specific exceptions. If you must catch multiple, chain them:
```python
except (KeyError, ValueError) as e:
logger.warning(f"Missing field: {e}")
```
---
## 5. No `entity_id` Format Validation
**Problem**: `svd_vectors.entity_id` can be either:
- An MP name (e.g., `"Van Dijk, I."`) for individual-level SVD
- A party name (e.g., `"GroenLinks-PvdA"`) for party-level SVD
No validation distinguishes which is which. Code must infer from context. (Note: In practice `svd_vectors.entity_id` only contains MP names — see §1 for investigation findings.)
**Fix**: Add explicit format marker or separate columns:
```python
# Option A: separate columns
svd_vectors = pd.DataFrame({
'mp_name': [...], # nullable
'party_name': [...], # nullable
'window': [...],
'vector_2d': [...]
})
# Option B: format prefix
# "mp:Van Dijk, I." or "party:GroenLinks-PvdA"
```
---
## 6. Silent Fallback When Party Centroids Fail
**Problem**: If `party_map` lookup fails (entity is a party, not MP), the code silently produces
`party_map_count: 0` and empty `parties_with_centroid_counts`. No warning is raised.
**Fix**: Add validation and warning:
```python
if party_map_count == 0:
st.warning(f"No party mappings found for {len(svd_df)} entities in window '{window}'")
```
---
## 7. Three Separate Party Alias Dictionaries (No Single Source of Truth)
**Problem**: Party name variations exist in 3+ places:
- `PARTY_COLOURS` keys
- `party_map` values (from `mp_party_history`)
- Raw data column values
No canonical alias mapping. Spelling mismatches cause silent failures.
**Fix**: Create one `PARTY_ALIASES` dict in `config.py`:
```python
PARTY_ALIASES = {
"GroenLinks-PvdA": ["GL-PvdA", "GroenLinks PvdA", "PvdA-GroenLinks"],
"PVV": ["Partij voor de Vrijheid"],
...
}
def resolve_party(name: str) -> str:
"""Normalize any party name variant to canonical form."""
for canonical, aliases in PARTY_ALIASES.items():
if name in aliases or name == canonical:
return canonical
return name # no alias found
```

@ -0,0 +1,55 @@
# Architecture
## Page Routing
- `Home.py` → thin wrapper, minimal logic
- `pages/1_🗳_Stemwijzer.py` → thin wrapper delegating to quiz module
- `pages/2_🔍_Explorer.py` → thin wrapper delegating to `explorer.py`
- **Pattern**: thin Streamlit page files that import and call into core modules
## Core Modules
```
database.py → MotionDatabase singleton (shared across all pages)
explorer.py → Explorer page logic, tab routing
explorer_helpers.py → Pure functions, chart builders, coordinate computation
analysis/ → SVD, UMAP, clustering algorithms
pipeline/ → Data ingestion pipeline
config.py → Dataclass Config, PARTY_COLOURS dict
```
## Data Flow
```
DuckDB → MotionDatabase (singleton)
st.cache_data loaders
explorer_helpers (pure functions)
Plotly charts → Streamlit
```
## Key Patterns
1. **Singleton per module**: `database.py` exports one `db` instance; `config.py` exports config + PARTY_COLOURS
2. **Graceful degradation**: try/except around optional dependencies (UMAP, Plotly)
3. **Pipeline**: fetch → transform → store (see `pipeline/` directory)
4. **API client**: with retry/backoff for external data sources
5. **Dummy fallbacks**: if optional dep unavailable, use dummy stub
## Database Schema (key relationships)
```
motions (id, title, date, category)
mp_votes (mp_id, motion_id, vote: -1/0/1)
svd_vectors (entity_id, window, vector_2d) ← entity_id = mp_name OR party_name
party_centroids (party, window, centroid_2d)
mp_party_history (mp_id, party, start_date, end_date)
```
## SVD Computation Pipeline
1. Build MP × Motion vote matrix from `mp_votes`
2. Run SVD to get 2D embeddings per MP
3. Optionally aggregate to party centroids
4. Align across windows using Procrustes
5. Store in `svd_vectors` table

@ -1,5 +1,51 @@
# Mindmodel constraints README
# Constraint Files Index
Files in .mindmodel/constraints/ are YAML-like constraint documents describing
conventions, patterns and remediation steps. Use these to guide PR reviews and
CI automation.
This directory contains all constraint files for the Stemwijzer codebase.
## Quick Navigation
| Category | File | Purpose |
|----------|------|---------|
| **Stack** | `../stack/stack.yaml` | Tech stack overview |
| **Architecture** | `../architecture/architecture.yaml` | Data flow, page routing, component relationships |
| **Conventions** | `../conventions/conventions.yaml` | Naming, error handling, code organization |
| **Domain** | `../domain/domain-glossary.yaml` | Dutch political terms, algorithm concepts |
| **Patterns** | `../patterns/patterns.yaml` | 10 code patterns (page wrapper, pipeline, etc.) |
| **Anti-Patterns** | `../anti-patterns/anti-patterns.yaml` | ⚠ 7 issues including CRITICAL BUG |
| **Dependencies** | `../dependencies/dependencies.yaml` | Library wiring, singletons, imports |
## How to Use
1. **Before writing code**: Check `patterns/patterns.yaml` for how similar features are implemented
2. **When naming things**: Follow `conventions/conventions.yaml` (snake_case functions, PascalCase classes)
3. **When handling errors**: Avoid patterns in `anti-patterns/anti-patterns.yaml`
4. **When working with domain terms**: Reference `domain/domain-glossary.yaml`
5. **When connecting components**: See `dependencies/dependencies.yaml` for wiring
## Key Conventions Summary
- **Files**: snake_case (`explorer_helpers.py`)
- **Functions**: snake_case (`compute_party_coords`)
- **Classes**: PascalCase (`MotionDatabase`)
- **Constants**: UPPER_SNAKE_CASE (`PARTY_COLOURS`)
- **No bare `except:`** — always specify exception type
- **Pure functions** in helpers — no IO, no Streamlit calls
- **One singleton per module**`db`, `config`, `PARTY_COLOURS`
## ⚠ Critical Bug
**Read `../anti-patterns/anti-patterns.yaml` first.** Section 1 documents a critical bug in
`explorer_helpers.py:compute_party_coords` where party names in `svd_vectors` entity_id are
not recognized because `party_map` only contains MP-name keys.
## Files Generated
- `manifest.yaml` — lists all constraint files with group mappings
- `stack/stack.yaml` — tech stack
- `architecture/architecture.yaml` — data flow & components
- `conventions/conventions.yaml` — coding conventions
- `domain/domain-glossary.yaml` — domain terminology
- `patterns/patterns.yaml` — 10 code patterns with examples
- `anti-patterns/anti-patterns.yaml` — 7 anti-patterns including CRITICAL BUG
- `dependencies/dependencies.yaml` — library wiring
- `README.md` — this index

@ -0,0 +1,184 @@
# Error Handling Constraints
## Core Rule
**Catch `Exception`, return safe fallbacks (False/[]/None)**
Never let exceptions propagate to user-facing code. Always provide a safe default.
## Patterns
### For Not-Found Operations
Return `None` or falsy value when item not found:
```python
# GOOD: Return None on not found
def get_motion_by_id(self, motion_id: int) -> Optional[Dict]:
try:
conn = duckdb.connect(self.db_path)
result = conn.execute(
"SELECT * FROM motions WHERE id = ?", (motion_id,)
).fetchone()
conn.close()
return result
except Exception:
conn.close()
return None
```
### For Collection Operations
Return empty list when no results:
```python
# GOOD: Return empty list on failure
def get_filtered_motions(self, **kwargs) -> List[Dict]:
try:
conn = duckdb.connect(self.db_path)
rows = conn.execute(query, params).fetchall()
conn.close()
return rows
except Exception:
conn.close()
return []
```
### For Boolean Operations
Return `False` for failed boolean checks:
```python
# GOOD: Return False on failure
def motion_exists(self, motion_id: int) -> bool:
try:
conn = duckdb.connect(self.db_path)
count = conn.execute(
"SELECT COUNT(*) FROM motions WHERE id = ?", (motion_id,)
).fetchone()[0]
conn.close()
return count > 0
except Exception:
return False
```
### For Creation Operations
Return `False` or empty string on failure:
```python
# GOOD: Return empty string on failure
def generate_summary(self, title: str, body: str) -> str:
try:
return ai_provider.chat_completion(messages)
except ai_provider.ProviderError:
logger.exception("AI provider failed")
return ""
```
## Anti-Patterns to Avoid
### Don't Catch Specific Exceptions Only
```python
# BAD: Catches only FileNotFoundError, misses other issues
try:
with open(path) as f:
return json.load(f)
except FileNotFoundError:
return None
```
### Don't Re-raise Without Context
```python
# BAD: Loses information
try:
process(data)
except Exception:
raise # No context added
```
### Don't Swallow Exceptions Silently
```python
# BAD: No logging, no fallback
try:
return risky_operation()
except Exception:
pass # What happened?
```
## Nested Exception Handling
When calling code that has its own error handling, wrap only if needed:
```python
# Accept result from wrapped function (it handles errors)
def fetch_motions(self, start_date):
# ai_provider_wrapper handles retries internally
embeddings = get_embeddings_with_retry(texts)
# Only wrap if wrapper doesn't handle errors
if all(e is None for e in embeddings):
logger.error("All embeddings failed")
return []
return process(embeddings)
```
## Context Managers
Use `try/finally` for cleanup:
```python
def process_with_temp_file(self):
temp = NamedTemporaryFile(delete=False)
try:
temp.write(data)
temp.close()
return process_file(temp.name)
finally:
os.unlink(temp.name)
temp.close()
```
## When to Log vs Return
| Scenario | Action |
|----------|--------|
| User action fails | Log warning, return safe default |
| Internal error (corrupt data) | Log error, return safe default |
| Transient failure (network) | Log warning, retry if appropriate |
| Configuration error | Log error, raise with clear message |
## Exception Propagation
Only raise exceptions for:
1. Configuration/setup errors (missing required env vars)
2. Programming errors (invalid arguments)
3. Fatal system errors (database corruption)
```python
# GOOD: Raise for configuration errors
def _get_api_key(self) -> str:
key = os.environ.get("OPENROUTER_API_KEY")
if not key:
raise ProviderError(
"OPENROUTER_API_KEY environment variable is required"
)
return key
```
## Logging Errors
Always include context:
```python
# GOOD: Include relevant context
_logger.error(
"Failed to fetch motion %d: %s",
motion_id,
exc
)
# BAD: No context
_logger.error("Failed to fetch")
```

@ -1,24 +1,205 @@
# Import grouping and ordering constraints
# Import Organization Constraints
rules:
- name: grouping
rule: "Group imports in three sections separated by a single blank line: stdlib, third-party, local."
examples:
- good: |
## Standard Order
Organize imports in three groups with blank lines between:
```python
# 1. Standard library imports (alphabetical within group)
import json
import logging
import os
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple
# 2. Third-party packages (alphabetical within group)
import duckdb
import requests
from config import config
# 3. Local application modules (can use relative imports)
from database import db
from summarizer import summarizer
```
## Alphabetical Ordering
Within each group, sort imports alphabetically:
```python
# GOOD - alphabetical
import json
import logging
from datetime import datetime
from typing import Dict, List, Optional
# BAD - random order
from typing import Optional
import json
from datetime import datetime
import logging
from typing import Dict, List
```
## Grouping Rules
### Standard Library
- `json`, `logging`, `os`, `sys`, `time`
- `datetime`, `timedelta` from `datetime`
- `Dict`, `List`, `Optional`, etc. from `typing`
- `argparse`, `pathlib`, `re`, `uuid`
### Third-Party
- `duckdb`, `requests`, `streamlit`
- `numpy`, `scipy`, `sklearn`
- `plotly`, `beautifulsoup4`
- `pytest`
### Local Application
- Modules from same package
- Relative imports when appropriate
## When to Use `from X import Y`
### Prefer `from module import specific_items` for:
- Constants and config
- Single classes or functions used frequently
- Type annotations
```python
# GOOD - clear about what we're using
from config import config
from database import db
# GOOD - type hints
from typing import Dict, List, Optional
```
### Use `import module` when:
- You need multiple items from the module
- Using module.namespace is clearer
```python
# GOOD - duckdb used for types and module access
import duckdb
from .pipeline import text_pipeline
- bad: |
conn = duckdb.connect(...)
result = conn.execute(...)
# Also acceptable for types
from typing import Dict
```
## Relative Imports
In package modules, prefer relative imports:
```python
# pipeline/svd_pipeline.py
from ..database import MotionDatabase # relative import
from .text_pipeline import process_text # relative import
```
## Circular Imports
Avoid circular imports by:
1. Moving shared code to a third module
2. Using TYPE_CHECKING for type hints only
```python
# types.py - shared type definitions
from typing import TypedDict
class MotionDict(TypedDict):
id: int
title: str
...
# module_a.py
from .types import MotionDict
# module_b.py - if needed here too
from .types import MotionDict
```
## Import Patterns to Avoid
### Wildcard Imports
```python
# BAD
from database import *
# GOOD
from database import db, MotionDatabase
```
### Import in Function Scope (unless necessary)
```python
# AVOID - delays import, makes dependencies unclear
def some_function():
import pandas as pd # Late import
return pd.DataFrame(...)
# PREFER - import at module level
import pandas as pd
def some_function():
return pd.DataFrame(...)
```
### Reassigning Imported Names
```python
# BAD - confusing
from module import process
process = something_else # Reassigning
# GOOD - clear naming
from module import process as process_data
```
## Type Checking Imports
For type hints only, use TYPE_CHECKING:
```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from .models import Motion
def get_motion(motion_id: int) -> "Motion": # String quote for forward ref
...
```
## Optional Dependency Imports
Handle optional dependencies gracefully:
```python
try:
import duckdb
except Exception:
duckdb = None # Will be checked later
class MotionDatabase:
def __init__(self):
if duckdb is None:
self._file_mode = True # Fallback mode
```
## Example: Complete Import Block
```python
# Complete example from database.py
import json
from pipeline import text_pipeline
import logging
import uuid
from datetime import datetime, timedelta
from typing import Dict, List, Optional, Tuple
import duckdb
- name: from_imports
rule: "Prefer 'from x import y' only when it improves clarity or avoids circular import; otherwise import module and reference attributes."
from config import config
enforcement_examples:
- "Run isort or ruff- import sorting in pre-commit or CI to enforce ordering."
from database import db
```

@ -0,0 +1,167 @@
# Logging Constraints
## Core Rule
**Use `logging.getLogger(__name__)` - never use `print()`**
## Logger Initialization
Get logger at module level:
```python
# GOOD: Use logging.getLogger(__name__)
import logging
_logger = logging.getLogger(__name__)
def some_function():
_logger.info("Processing started")
_logger.debug("Detail: %s", detail)
```
## Logger Naming
Use `__name__` for automatic module path:
```python
# In database.py - logger will be "database"
_logger = logging.getLogger(__name__)
# In pipeline/svd_pipeline.py - logger will be "pipeline.svd_pipeline"
_logger = logging.getLogger(__name__)
```
## Log Levels
| Level | When to Use |
|-------|-------------|
| DEBUG | Detailed diagnostic info (dev only) |
| INFO | Normal operation milestones |
| WARNING | Unexpected but handled (fallbacks) |
| ERROR | Operation failed, may need attention |
| CRITICAL | Fatal error, program may crash |
## Examples
### Good Logging Practice
```python
_logger.info("Pipeline run: %s → %s (%s windows)", start, end, count)
_logger.debug("Batch embedding attempt %d failed: %s", attempt, exc)
_logger.warning("Fallback used for motion %d: %s", motion_id, reason)
_logger.error("Query failed: %s", exc)
```
### Bad: Using print()
```python
# BAD - don't use print
print(f"Fetched {len(voting_records)} voting records from API")
print(f"Error fetching motions from API: {e}")
```
### Good: Using logger
```python
# GOOD - use logger
_logger.info("Fetched %d voting records from API", len(voting_records))
_logger.error("Error fetching motions from API: %s", e)
```
## Exception Logging
Use `_logger.exception()` for caught exceptions (includes traceback):
```python
try:
result = risky_operation()
except Exception as exc:
_logger.exception("Operation failed: %s", exc)
return fallback_value
```
Use `_logger.error()` with explicit exception for controlled errors:
```python
try:
result = risky_operation()
except Exception as exc:
_logger.error("Operation failed: %s", exc)
return fallback_value
```
## Configuration
Ensure logging is configured in entry points:
```python
# pipeline/run_pipeline.py
def run(args):
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
# ... rest of pipeline
```
## Anti-Patterns
### Debug Prints in Production Code
```python
# BAD
print(f"[TRAJ DEBUG] processing window {wid}")
# GOOD
_logger.debug("Processing window %s", wid)
```
### Inconsistent Logger Names
```python
# BAD - mixing _logger and logger
_logger = logging.getLogger(__name__)
logger = logging.getLogger("other") # Inconsistent
# GOOD - use single consistent pattern
_logger = logging.getLogger(__name__)
```
### Missing Logger Initialization
```python
# BAD - no logger defined
def some_function():
logging.getLogger(__name__).info("...") # Redundant calls
# GOOD - define once at module level
_logger = logging.getLogger(__name__)
def some_function():
_logger.info("...")
```
## Sensitive Data
Never log sensitive information:
- API keys
- User votes
- Session IDs (if tied to user data)
- Personal information
```python
# BAD
_logger.info("User %s voted %s", user_id, vote)
# GOOD - log aggregates, not individual votes
_logger.info("Vote recorded for session %s", session_id[:8])
```
## Structured Logging
For complex data, use structured logging:
```python
_logger.info(
"Motion processed",
extra={
"motion_id": motion_id,
"policy_area": policy_area,
"processing_time_ms": elapsed_ms,
}
)
```

@ -1,30 +1,141 @@
# Naming constraint rules (example constraint file)
rules:
- name: module_file_names
rule: "Use snake_case for Python module filenames (e.g., text_pipeline.py, ai_provider.py)."
examples:
- good: "text_pipeline.py"
- bad: "TextPipeline.py"
- name: function_names
rule: "Use snake_case for functions and methods."
examples:
- good: "def compute_similarities(...):"
- bad: "def ComputeSimilarities(...):"
- name: class_names
rule: "Use PascalCase for classes."
examples:
- good: "class MotionDatabase:"
- bad: "class motion_database:"
- name: constants
rule: "Constants use UPPER_SNAKE_CASE."
examples:
- good: "VOTE_MAP = { ... }"
- bad: "vote_map = { ... }"
enforcement_examples:
- "Add a linter rule in CI: ruff or flake8 naming plugin to detect violations."
- "Run `python -m pip install ruff` and `ruff check` as part of CI."
# Naming Constraints
## File Names
### Python Modules
- **Convention**: `snake_case.py`
- **Examples**: `motion_database.py`, `api_client.py`, `text_pipeline.py`
### Test Files
- **Convention**: `test_<module_name>.py`
- **Examples**: `test_database.py`, `test_api_client.py`
### Config Files
- **Convention**: `snake_case`
- **Examples**: `config.py`, `.env.example`, `pyproject.toml`
### Directories
- **Convention**: `snake_case/`
- **Examples**: `pipeline/`, `tests/integration/`, `src/validators/`
## Class Names
- **Convention**: `PascalCase`
- **Examples**: `MotionDatabase`, `TweedeKamerAPI`, `MotionSummarizer`
### Naming Patterns
| Pattern | Example |
|---------|---------|
| Database wrapper | `MotionDatabase` |
| API client | `TweedeKamerAPI` |
| Service/Helpers | `MotionScraper`, `MotionAnalyzer` |
| Exceptions | `ProviderError` |
## Function Names
- **Convention**: `snake_case`
- **Examples**: `get_motions`, `compute_similarity`, `process_voting_records`
### Private Methods
- **Convention**: `_snake_case` (single underscore prefix)
- **Examples**: `_get_voting_records`, `_parse_response`
## Variable Names
### Regular Variables
- **Convention**: `snake_case`
- **Examples**: `motion_id`, `party_name`, `voting_results`
### Constants (Module-Level)
- **Convention**: `UPPER_SNAKE_CASE`
- **Examples**: `DATABASE_PATH`, `API_TIMEOUT`, `MAX_RETRIES`
### Config Variables (in dataclass)
- **Convention**: `UPPER_SNAKE_CASE`
- **Examples**: `QWEN_MODEL`, `POLICY_AREAS`
### Booleans
- **Convention**: `is_`, `has_`, `can_` prefixes or `_flag` suffix
- **Examples**: `is_active`, `has_votes`, `skip_extract`
### Private Variables
- **Convention**: `_underscore_prefix`
- **Examples**: `_conn`, `_cache`, `_session`
## Singleton Instances
- **Convention**: `lower_snake_case` at module level
- **Examples**: `db = MotionDatabase()`, `summarizer = MotionSummarizer()`
```python
# database.py
class MotionDatabase:
...
# Singleton instance
db = MotionDatabase()
# Usage
from database import db
motions = db.get_motions()
```
## Type Variables
- **Convention**: `PascalCase`
- **Examples**: `T = TypeVar('T')`, `MotionDict = Dict[str, Any]`
## Anti-Patterns
### Inconsistent Naming
```python
# BAD - mixing styles
get_motions() # snake_case
GetMotionById() # PascalCase
processData() # camelCase
# GOOD - consistent snake_case
get_motions()
get_motion_by_id()
process_voting_data()
```
### Abbreviations
```python
# AVOID - unclear abbreviations
calc_similarity() # calculate_*
proc_votes() # process_*
get_mp_data() # get_mp_metadata()
# PREFER - full words
calculate_similarity()
process_votes()
get_mp_metadata()
```
### Hungarian Notation
```python
# BAD - Hungarian notation
str_title = "..."
int_count = 0
b_is_active = True
# GOOD - clear types via naming
title = "..."
count = 0
is_active = True
```
## Special Cases
### Window IDs
- **Format**: `"YYYY-QN"` or `"YYYY"`
- **Examples**: `"2024-Q1"`, `"2024-Q2"`, `"2024"`
### Policy Areas
- **Convention**: PascalCase with spaces
- **Examples**: `"Economie"`, `"Sociale Zaken"`, `"Klimaat"`
### Vote Values
- **Convention**: PascalCase Dutch terms
- **Values**: `"Voor"`, `"Tegen"`, `"Onthouden"`, `"Geen stem"`, `"Afwezig"`

@ -0,0 +1,233 @@
# Type Hint Constraints
## Core Rule
**Use type hints on all public functions and methods**
## Function Type Hints
### Required on Public APIs
```python
# GOOD - complete type hints
def get_motion(self, motion_id: int) -> Optional[Dict]:
...
def get_filtered_motions(
self,
policy_area: str = "Alle",
limit: int = 10
) -> List[Dict]:
...
def calculate_similarity(self, motion_a: int, motion_b: int) -> float:
...
```
### Optional Parameters
Use `Optional[X]` or `X | None`:
```python
# Both forms are acceptable
def get_motion(self, motion_id: Optional[int] = None) -> Optional[Dict]:
...
def get_motion(self, motion_id: int | None = None) -> dict | None:
...
```
### Multiple Return Types
Use `Union[X, Y]` or `|` operator:
```python
# Acceptable forms
def parse_value(self, value: str) -> Union[bool, str, None]:
...
def parse_value(self, value: str) -> bool | str | None:
...
```
### Generic Types
Use `List[X]`, `Dict[K, V]`, `Tuple[X, Y]`:
```python
from typing import Dict, List, Optional, Tuple
def get_motions(self, ids: List[int]) -> Dict[int, Dict]:
"""Map motion_id -> motion data."""
...
def process_batch(self, items: List[str]) -> Tuple[List[str], List[str]]:
"""Returns (successes, failures)."""
...
```
## Collection Types
Prefer specific types over bare `list`/`dict`:
```python
# GOOD - specific types
def get_votes(self) -> List[str]:
...
def get_metadata(self) -> Dict[str, Any]:
...
# ACCEPTABLE - for truly generic collections
def merge_dicts(*dicts: dict) -> dict:
...
```
## DuckDB Result Types
DuckDB returns tuples/lists - document expected structure:
```python
def get_motion(self, motion_id: int) -> Optional[Tuple]:
"""Returns (id, title, description, date, ...) or None."""
conn = duckdb.connect(self.db_path)
try:
result = conn.execute(
"SELECT * FROM motions WHERE id = ?", (motion_id,)
).fetchone()
return result
finally:
conn.close()
# Or use Dict for clarity
def get_motion_as_dict(self, motion_id: int) -> Optional[Dict]:
"""Returns motion dict or None."""
conn = duckdb.connect(self.db_path)
try:
row = conn.execute(
"SELECT * FROM motions WHERE id = ?", (motion_id,)
).fetchone()
if row:
return {
"id": row[0],
"title": row[1],
"description": row[2],
...
}
return None
finally:
conn.close()
```
## Class/Instance Types
Use `Self` for methods returning instance type:
```python
from typing import Self
class MotionDatabase:
def with_connection(self, path: str) -> Self:
"""Return new instance with different path."""
return MotionDatabase(db_path=path)
```
## Callback/Function Types
Use `Callable` for function parameters:
```python
from typing import Callable
def process_motions(
motions: List[Dict],
processor: Callable[[Dict], Any]
) -> List[Any]:
return [processor(m) for m in motions]
```
## Type Aliases
Define clear type aliases for domain concepts:
```python
from typing import Dict, List, TypedDict, Literal
# Vote values
VoteValue = Literal["Voor", "Tegen", "Onthouden", "Geen stem", "Afwezig"]
# Policy areas
PolicyArea = Literal["Alle", "Economie", "Klimaat", "Immigratie", ...]
# Motion dict
class MotionDict(TypedDict):
id: int
title: str
description: Optional[str]
date: Optional[str]
policy_area: Optional[str]
voting_results: Optional[str] # JSON string
winning_margin: Optional[float]
def get_motion(self, motion_id: int) -> Optional[MotionDict]:
...
```
## Avoid `Any`
Use `Any` sparingly - prefer specific types:
```python
# AVOID - too vague
def process(data: Any) -> Any:
...
# PREFER - specific types
def process(motion: MotionDict) -> Optional[SimilarityResult]:
...
```
## Inline Type Hints
For simple cases, inline hints are fine:
```python
def get_count(self) -> int:
...
def is_empty(self) -> bool:
...
```
## Docstring Type Hints
For complex types, include in docstrings:
```python
def get_party_positions(self, window_id: str) -> Dict[str, List[float]]:
"""Get party positions in political space.
Args:
window_id: Time window (e.g., "2024-Q1")
Returns:
Dict mapping party_name -> [x, y] coordinates
Example:
>>> positions = db.get_party_positions("2024-Q1")
>>> positions["VVD"]
[0.5, -0.3]
"""
...
```
## Type Checking
For runtime type checking, use runtime checks:
```python
def set_count(self, count: int) -> None:
if not isinstance(count, int):
raise TypeError(f"Expected int, got {type(count).__name__}")
self._count = count
```

@ -0,0 +1,124 @@
# Naming Conventions
## Files
- **snake_case** for all Python files: `database.py`, `explorer_helpers.py`, `motion_cache.py`
- **PascalCase** NOT used for files
## Functions
- **snake_case**: `get_svd_vectors()`, `compute_party_coords()`, `build_scatter_trace()`
- Private helpers prefixed with `_`: `_get_window_data()`
## Classes
- **PascalCase**: `MotionDatabase`, `Config`
- **Dataclass pattern** for Config: `@dataclass` decorator with typed fields
## Variables
- **snake_case**: `party_map`, `mp_name`, `svd_vectors`, `party_centroids`
- **CONSTANT_SNAKE_CASE** for module-level constants: `PARTY_COLOURS`, `DEFAULT_WINDOW`
## Module-Level Exports
- **Singleton instance**: `db = MotionDatabase()` at module bottom (not class-level)
- **Config instance**: `config = Config(...)` at module bottom
- **Dicts**: `PARTY_COLOURS` exported from `config.py`
---
# Error Handling
## Known Patterns
1. **Bare except with pass** (ANTI-PATTERN - see anti-patterns.yaml)
```python
except:
pass # database.py:47
```
2. **Graceful degradation**: catch specific exceptions, fall back to default
```python
try:
result = compute_svd()
except ImportError:
result = DEFAULT_SVD
```
3. **Optional dependency fallbacks**:
```python
try:
import umap
use_umap = True
except ImportError:
use_umap = False
```
4. **Nested exception handling** (ANTI-PATTERN - see anti-patterns.yaml):
```python
try:
...
except Exception:
try:
...
except Exception:
pass
```
## Rules
- Never use bare `except:` — always specify exception type
- Never swallow exceptions silently — log or return a sensible default
- For optional deps, use `ImportError` or `ModuleNotFoundError` explicitly
- Avoid nested try/except blocks
---
# Code Organization
## Singleton Pattern
Each module owns one shared instance:
```python
# database.py
db = MotionDatabase()
# config.py
config = Config(...)
PARTY_COLOURS = {...}
```
## Pure Functions in Helpers
`explorer_helpers.py` contains only pure functions (no IO, no Streamlit calls):
```python
def compute_party_coords(svd_vectors, party_map):
"""Pure: no side effects, no imports from this module"""
...
def build_scatter_trace(df, color_col):
"""Pure: returns Plotly trace dict"""
...
```
## Cached Data Loaders
Use `@st.cache_data` for expensive data loading:
```python
@st.cache_data
def load_svd_vectors(window: str) -> pd.DataFrame:
return db.get_svd_vectors(window)
```
## Dataclass Config
```python
@dataclass
class Config:
db_path: str = "data/stemwijzer.duckdb"
default_window: str = "2023"
party_colours: dict = field(default_factory=lambda: PARTY_COLOURS)
```
---
# Imports
## Ordering (convention)
1. Standard library
2. Third-party (streamlit, ibis, plotly, sklearn, umap)
3. Local/relative imports
## Avoid
- Wildcard imports (`from module import *`)
- Circular imports (ensure dependency direction: helpers → database → config)

@ -0,0 +1,78 @@
# Dependencies
## Core Library Wiring
### Database Layer
```
ibis → DuckDB → MotionDatabase singleton (database.py)
sqlglot (ibis dependency)
```
### Data Processing
```
pandas → (used throughout for DataFrame operations)
numpy → (used by sklearn, scipy, umap)
scipy → spatial.procrustes for window alignment
```
### ML Pipeline
```
sklearn.cluster → KMeans, Procrustes
sklearn.preprocessing → StandardScaler
umap → UMAP (optional, graceful fallback)
```
### Visualization
```
plotly → explorer_helpers.py chart builders
st.plotly_chart → explorer.py rendering
```
### Streamlit
```
streamlit → all pages, @st.cache_data decorators
```
## Optional Dependencies
| Package | Required | Fallback |
|---------|----------|----------|
| `umap` | No | Use raw SVD vectors (first 2 dims) |
| `plotly` | Yes | Raises ImportError |
| `duckdb` | Yes | — |
| `ibis` | Yes | — |
| `sklearn` | Yes | — |
## Singleton Instances
| Module | Instance | Type |
|--------|----------|------|
| `database.py` | `db` | `MotionDatabase` |
| `config.py` | `config` | `Config` (dataclass) |
| `config.py` | `PARTY_COLOURS` | `dict[str, str]` |
## Key Imports by File
```
explorer.py:
- import streamlit as st
- from database import db
- from explorer_helpers import *
explorer_helpers.py:
- import pandas as pd
- import plotly.graph_objects as go
- from database import db (optional, for type hints)
database.py:
- import ibis
- import duckdb
- from config import config, PARTY_COLOURS
config.py:
- from dataclasses import dataclass, field
- import streamlit as st (optional, for warnings)
```
## Environment
- Python ≥3.13
- Environment variables via `.env` (DB path, API keys)
- No `.env` values in constraint files (security)

@ -0,0 +1,107 @@
# Domain Glossary - Dutch Political Terms
## Core Entities
### Motion / Motie
- Parliamentary motion submitted by MPs
- Fields: `id`, `title`, `date`, `category`
- MPs vote: **For** (+1), **Against** (-1), **Abstain** (0), **Absent**
### MP / Kamerlid
- Member of Parliament (Tweede Kamerlid)
- Identified by full name (e.g., "Van Dijk, I.")
- Has voting record, party affiliation, SVD position vector
- Historical: `mp_party_history` tracks party changes over time
### Party / Fractie
- Political party (e.g., "GroenLinks-PvdA", "PVV", "VVD")
- Party centroids: average SVD position of all MPs in party
- Aliases: multiple spelling variants exist (see anti-patterns.yaml)
### Vote / Stemming
- Individual MP's vote on a motion: +1, 0, -1
- Aggregated to compute SVD vectors
---
## Time & Analysis Concepts
### Window / Tijdsvenster
- Time period for analysis (annual or quarterly)
- Values: "2023", "2023-Q1", "2024", etc.
- SVD vectors computed per window
- Windows can be aligned across time using Procrustes
### Trajectory
- MP's position change across multiple windows
- Computed from `svd_vectors` + window ordering
- Used for trend analysis in Evolution tab
---
## Mathematical / Algorithmic Terms
### SVD Vector
- 2D vector from Singular Value Decomposition of MP × Motion vote matrix
- Represents MP's position in political space
- `entity_id` in `svd_vectors`: either MP name (when individual MPs) or party name (when party-level)
### Political Compass
- 2D visualization: X-axis = Left↔Right, Y-axis = Progressive↔Conservative
- SVD vectors mapped to compass quadrants
- UMAP used for projection
### Procrustes Alignment
- Algorithm to align SVD vectors across time windows
- Ensures comparable positions across years/quarters
- Implemented via `scipy.spatial.procrustes` or scikit-learn
### Centroid
- Geometric center of a set of points
- Party centroid = average SVD position of all MPs in that party
- Computed from `svd_vectors` filtered by party
### UMAP
- Uniform Manifold Approximation and Projection
- Dimensionality reduction for visualization
- Optional dependency — graceful fallback if unavailable
---
## Visualization
### PARTY_COLOURS
- Dict mapping party names to hex color codes
- Used in all Plotly charts for consistent party coloring
- Source: `config.py` → `PARTY_COLOURS` constant
- **Issue**: 3 separate alias dictionaries exist (no single source of truth)
---
## Application Pages
### Home
- Landing page with app overview
### Stemwijzer (Quiz)
- User answers questions → matched to parties
- Thin wrapper around quiz module
### Explorer (4 tabs)
- **Motion tab**: SVD positions colored by vote on selected motion
- **MP tab**: Individual MP trajectories across windows
- **Party tab**: Party centroids with members as scatter
- **Evolution tab**: How positions change over time
---
## Database Table Reference
| Table | Key Fields |
|-------|-----------|
| `motions` | id, title, date, category |
| `mp_votes` | mp_id, motion_id, vote |
| `svd_vectors` | entity_id, window, vector_2d (list[2]) |
| `party_centroids` | party, window, centroid_2d |
| `mp_party_history` | mp_id, party, start_date, end_date |
| `windows` | window_id, start_date, end_date, period_type |
| `mp_trajectories` | mp_id, window, trajectory_vector |

@ -0,0 +1,196 @@
"""Example: TweedeKamerAPI usage - from api_client.py and actual codebase."""
from datetime import datetime, timedelta
from typing import Dict, List
# Import the API client
from api_client import TweedeKamerAPI
# =============================================================================
# Example 1: Basic API usage
# =============================================================================
def example_fetch_motions():
"""Fetch recent parliamentary motions from TweedeKamer API."""
api = TweedeKamerAPI()
# Fetch motions from last 30 days
start_date = datetime.now() - timedelta(days=30)
try:
motions = api.get_motions(start_date=start_date, limit=100)
print(f"Fetched {len(motions)} motions")
for motion in motions[:5]: # Show first 5
print(f" - {motion.get('title', 'N/A')}")
return motions
finally:
api.close()
# =============================================================================
# Example 2: Fetching with date range
# =============================================================================
def example_date_range():
"""Fetch motions from a specific date range."""
api = TweedeKamerAPI()
start = datetime(2024, 1, 1)
end = datetime(2024, 3, 31) # Q1 2024
try:
motions = api.get_motions(start_date=start, end_date=end, limit=500)
# Group by policy area
by_area = {}
for m in motions:
area = m.get("policy_area", "Onbekend")
by_area.setdefault(area, []).append(m)
for area, area_motions in sorted(by_area.items()):
print(f"{area}: {len(area_motions)} motions")
return motions
finally:
api.close()
# =============================================================================
# Example 3: Context manager usage
# =============================================================================
def example_context_manager():
"""Use API client as context manager."""
with TweedeKamerAPI() as api:
motions = api.get_motions(
start_date=datetime.now() - timedelta(days=7), limit=50
)
print(f"Fetched {len(motions)} motions this week")
return motions
# =============================================================================
# Example 4: Processing voting records
# =============================================================================
def example_process_votes():
"""Process individual voting records from API."""
api = TweedeKamerAPI()
start_date = datetime.now() - timedelta(days=7)
try:
# Get voting records directly
voting_records, besluit_meta = api._get_voting_records(
start_date=start_date, limit=1000
)
print(f"Fetched {len(voting_records)} voting records")
print(f"From {len(besluit_meta)} unique decisions")
# Count votes by party
party_votes = {}
for record in voting_records:
party = record.get("Fractie", "Onbekend")
vote = record.get("Soort", "Onbekend")
party_votes.setdefault(party, {})[vote] = (
party_votes.get(party, {}).get(vote, 0) + 1
)
for party, votes in sorted(party_votes.items()):
total = sum(votes.values())
voor = votes.get("Voor", 0)
print(f"{party}: {total} votes ({voor} voor)")
return voting_records
finally:
api.close()
# =============================================================================
# Example 5: Safe API call with fallback
# =============================================================================
def example_safe_call():
"""Make API call with safe fallback on failure."""
api = TweedeKamerAPI()
try:
# This will return [] on any error
motions = api.get_motions(
start_date=datetime.now() - timedelta(days=30), limit=100
)
if not motions:
print("No motions returned - using cached data")
# Fallback to cached/local data
from database import db
return db.get_filtered_motions(limit=10)
return motions
finally:
api.close()
# =============================================================================
# Example 6: Pagination handling
# =============================================================================
def example_pagination():
"""Understand how pagination works in the API."""
api = TweedeKamerAPI()
start_date = datetime.now() - timedelta(days=365)
# Simulate pagination
page_size = 250
total_limit = 500
all_motions = []
skip = 0
while len(all_motions) < total_limit:
print(f"Fetching page with skip={skip}...")
# In real usage, get_motions handles pagination internally
# This demonstrates what's happening under the hood
page_motions = api._fetch_page(start_date=start_date, skip=skip, top=page_size)
if not page_motions:
break
all_motions.extend(page_motions)
skip += page_size
if len(page_motions) < page_size:
break # Last page
print(f"Total fetched: {len(all_motions)} motions")
return all_motions
if __name__ == "__main__":
print("=== Basic Fetch ===")
example_fetch_motions()
print("\n=== Process Votes ===")
example_process_votes()

@ -0,0 +1,191 @@
"""Example: MotionDatabase usage - from database.py and actual codebase."""
from typing import Dict, List, Optional
import duckdb
import json
from config import config
# Import the singleton instance
from database import db
# =============================================================================
# Example 1: Getting filtered motions
# =============================================================================
def example_get_filtered_motions():
"""Get controversial motions from a specific policy area."""
motions = db.get_filtered_motions(
policy_area="Klimaat",
min_margin=0.0,
max_margin=0.3, # Controversial: close margin
limit=10,
)
for motion in motions:
print(f"{motion['title']}: {motion['winning_margin']:.1%} margin")
return motions
# =============================================================================
# Example 2: Creating a voting session
# =============================================================================
def example_voting_session():
"""Create a new user session and record votes."""
# Create session for 10 motions
session_id = db.create_session(total_motions=10)
print(f"Created session: {session_id}")
# Get motions for the session
motions = db.get_filtered_motions(policy_area="Alle", limit=10)
# Record votes
for motion in motions:
# In real app, user would choose vote
vote = "Voor" # Example vote
db.record_vote(session_id=session_id, motion_id=motion["id"], vote=vote)
# Get results
results = db.get_party_results(session_id)
for party, result in sorted(results.items(), key=lambda x: -x[1]["agreement"]):
print(f"{party}: {result['agreement']:.1%} agreement")
return results
# =============================================================================
# Example 3: Working with DuckDB connections directly
# =============================================================================
def example_direct_duckdb():
"""Example of proper DuckDB connection handling."""
conn = duckdb.connect(config.DATABASE_PATH)
try:
# Get motion with votes
result = conn.execute(
"""
SELECT m.*,
JSON_EXTRACT(voting_results, '$.total_votes') as total_votes
FROM motions m
WHERE m.id = ?
""",
(123,),
).fetchone()
if result:
print(f"Motion: {result[1]}") # title is index 1
return result
finally:
conn.close()
# =============================================================================
# Example 4: Bulk operations
# =============================================================================
def example_bulk_insert():
"""Example of bulk inserting motions."""
# Sample data
motions = [
{
"title": "Motion about climate policy",
"description": "Proposal to reduce emissions",
"date": "2024-01-15",
"policy_area": "Klimaat",
"voting_results": json.dumps({"Voor": 75, "Tegen": 65}),
"winning_margin": 0.07,
"controversy_score": 0.85,
},
{
"title": "Motion about healthcare",
"description": "Increase healthcare budget",
"date": "2024-01-20",
"policy_area": "Zorg",
"voting_results": json.dumps({"Voor": 90, "Tegen": 50}),
"winning_margin": 0.29,
"controversy_score": 0.42,
},
]
conn = duckdb.connect(config.DATABASE_PATH)
try:
for motion in motions:
conn.execute(
"""
INSERT INTO motions
(title, description, date, policy_area, voting_results,
winning_margin, controversy_score)
VALUES (?, ?, ?, ?, ?, ?, ?)
""",
(
motion["title"],
motion["description"],
motion["date"],
motion["policy_area"],
motion["voting_results"],
motion["winning_margin"],
motion["controversy_score"],
),
)
conn.close()
print(f"Inserted {len(motions)} motions")
except Exception as e:
conn.close()
print(f"Error inserting motions: {e}")
# =============================================================================
# Example 5: Query with aggregation
# =============================================================================
def example_aggregation():
"""Example of aggregate queries."""
conn = duckdb.connect(config.DATABASE_PATH)
try:
# Get statistics by policy area
results = conn.execute("""
SELECT
policy_area,
COUNT(*) as motion_count,
AVG(winning_margin) as avg_margin,
AVG(controversy_score) as avg_controversy
FROM motions
WHERE policy_area IS NOT NULL
GROUP BY policy_area
ORDER BY motion_count DESC
""").fetchall()
for row in results:
print(
f"{row[0]}: {row[1]} motions, "
f"avg margin {row[2]:.1%}, "
f"controversy {row[3]:.2f}"
)
conn.close()
return results
except Exception as e:
conn.close()
return []
if __name__ == "__main__":
print("=== Filtered Motions ===")
example_get_filtered_motions()
print("\n=== Aggregation ===")
example_aggregation()

@ -0,0 +1,217 @@
"""Example: Pipeline phase execution - from pipeline/run_pipeline.py and actual codebase."""
import argparse
from datetime import date, timedelta
from typing import List, Tuple
# Import pipeline modules
from pipeline.fetch_mp_metadata import fetch_mp_metadata
from pipeline.extract_mp_votes import extract_mp_votes
from pipeline.svd_pipeline import run_svd_pipeline
from pipeline.text_pipeline import run_text_pipeline
from pipeline.fusion import run_fusion
from database import MotionDatabase
# =============================================================================
# Example 1: Running full pipeline
# =============================================================================
def example_full_pipeline():
"""Run the complete data ingestion pipeline."""
# Parse arguments like CLI would
parser = argparse.ArgumentParser(description="Pipeline runner")
parser.add_argument("--db-path", default="data/motions.db")
parser.add_argument("--start-date", default=None)
parser.add_argument("--end-date", default=None)
parser.add_argument(
"--window-size", choices=["quarterly", "annual"], default="quarterly"
)
parser.add_argument("--svd-k", type=int, default=50)
args = parser.parse_args([])
# Resolve dates
end_date = date.fromisoformat(args.end_date) if args.end_date else date.today()
start_date = (
date.fromisoformat(args.start_date)
if args.start_date
else end_date - timedelta(days=730)
)
print(f"Running pipeline: {start_date}{end_date}")
print(f"Window size: {args.window_size}")
print(f"DB path: {args.db_path}")
# Initialize database
db = MotionDatabase(args.db_path)
# Phase 1: Fetch MP metadata
print("\n=== Phase 1: MP Metadata ===")
n_mp = fetch_mp_metadata(db_path=args.db_path)
print(f"Processed {n_mp} MPs")
# Phase 2: Extract MP votes
print("\n=== Phase 2: Extract Votes ===")
n_votes = extract_mp_votes(db_path=args.db_path)
print(f"Extracted {n_votes} vote records")
# Phase 3: Generate time windows
print("\n=== Phase 3: SVD Pipeline ===")
windows = generate_windows(start_date, end_date, args.window_size)
print(f"Generated {len(windows)} windows: {windows}")
# Phase 4: SVD per window
run_svd_pipeline(db, windows, args.svd_k)
print(f"Computed SVD for {len(windows)} windows")
# Phase 5: Text embeddings
print("\n=== Phase 4: Text Embeddings ===")
run_text_pipeline(args.db_path, batch_size=50)
print("Text embeddings completed")
# Phase 6: Fusion
print("\n=== Phase 5: Fusion ===")
run_fusion(args.db_path, windows)
print("Fusion completed")
print("\n=== Pipeline Complete ===")
# =============================================================================
# Example 2: Generate time windows
# =============================================================================
def generate_windows(
start: date, end: date, granularity: str
) -> List[Tuple[str, str, str]]:
"""Generate time windows for pipeline processing."""
windows = []
cursor = date(start.year, start.month, 1)
if granularity == "annual":
cursor = date(start.year, 1, 1)
while cursor <= end:
year_end = date(cursor.year, 12, 31)
w_end = min(year_end, end)
windows.append((str(cursor.year), cursor.isoformat(), w_end.isoformat()))
cursor = date(cursor.year + 1, 1, 1)
else:
# quarterly
quarter_starts = {1: 1, 2: 4, 3: 7, 4: 10}
quarter_ends = {1: 3, 2: 6, 3: 9, 4: 12}
q = (cursor.month - 1) // 3 + 1
cursor = date(cursor.year, quarter_starts[q], 1)
while cursor <= end:
q = (cursor.month - 1) // 3 + 1
import calendar
q_end_month = quarter_ends[q]
last_day = calendar.monthrange(cursor.year, q_end_month)[1]
q_end = date(cursor.year, q_end_month, last_day)
w_end = min(q_end, end)
window_id = f"{cursor.year}-Q{q}"
windows.append((window_id, cursor.isoformat(), w_end.isoformat()))
cursor = q_end + timedelta(days=1)
return windows
def example_window_generation():
"""Example of window generation."""
start = date(2023, 1, 1)
end = date(2024, 6, 30)
print("Quarterly windows:")
quarterly = generate_windows(start, end, "quarterly")
for wid, s, e in quarterly:
print(f" {wid}: {s} to {e}")
print("\nAnnual windows:")
annual = generate_windows(start, end, "annual")
for wid, s, e in annual:
print(f" {wid}: {s} to {e}")
# =============================================================================
# Example 3: Running individual phases
# =============================================================================
def example_individual_phases():
"""Run pipeline phases individually for debugging."""
db_path = "data/motions.db"
db = MotionDatabase(db_path)
# Only run MP metadata fetch
print("Fetching MP metadata...")
n = fetch_mp_metadata(db_path=db_path)
print(f" {n} MPs processed")
# Only run vote extraction
print("Extracting votes...")
n = extract_mp_votes(db_path=db_path)
print(f" {n} votes extracted")
# Only run SVD for specific window
print("Computing SVD...")
windows = [("2024-Q1", "2024-01-01", "2024-03-31")]
run_svd_pipeline(db, windows, k=50)
print(" SVD computed")
# Only run text embeddings
print("Computing embeddings...")
run_text_pipeline(db_path, batch_size=25) # Smaller batch for testing
print(" Embeddings computed")
# =============================================================================
# Example 4: Dry run
# =============================================================================
def example_dry_run():
"""Show what pipeline would do without making changes."""
print("DRY RUN - no writes will be made")
start_date = date(2024, 1, 1)
end_date = date(2024, 6, 30)
# Generate and show windows
windows = generate_windows(start_date, end_date, "quarterly")
print(f"Would process {len(windows)} windows:")
for wid, s, e in windows:
print(f" {wid}: {s} to {e}")
print("\nWould run phases:")
print(" 1. fetch_mp_metadata")
print(" 2. extract_mp_votes")
print(" 3. svd_pipeline")
print(" 4. text_pipeline")
print(" 5. fusion")
if __name__ == "__main__":
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
print("=== Window Generation ===")
example_window_generation()
print("\n=== Dry Run ===")
example_dry_run()

@ -0,0 +1,316 @@
"""Example: Streamlit page patterns - from actual pages/ files."""
import streamlit as st
# =============================================================================
# Example 1: Home page (Home.py)
# =============================================================================
def render_home_page():
"""Simplified version of Home.py."""
st.set_page_config(
page_title="Motief: de stematlas",
page_icon="🗺",
layout="centered",
initial_sidebar_state="expanded",
)
st.title("🗺 Motief: de stematlas")
st.markdown(
"**Motief** brengt de Nederlandse Tweede Kamer in kaart op basis van "
"echte stemmingen over moties. Gebruik de Stemwijzer om te ontdekken welke "
"partij het beste bij jouw standpunten past, of verken de politieke ruimte "
"zelf in de Explorer."
)
st.divider()
col1, col2 = st.columns(2)
with col1:
st.subheader("🗳 Stemwijzer")
st.markdown(
"Stem op echte Tweede Kamer moties en zie welke partij het "
"dichtst bij jouw keuzes staat."
)
st.page_link("pages/1_Stemwijzer.py", label="Open Stemwijzer", icon="🗳")
with col2:
st.subheader("🔭 Politiek Explorer")
st.markdown(
"Verken het politieke kompas, partijtrajecten door de tijd, "
"en zoek vergelijkbare moties op in het archief."
)
st.page_link("pages/2_Explorer.py", label="Open Explorer", icon="🔭")
st.divider()
st.caption("Data: Tweede Kamer API · Embeddings: QWEN (via OpenRouter)")
# =============================================================================
# Example 2: Thin page wrapper (pages/1_Stemwijzer.py)
# =============================================================================
def render_stemwijzer_page():
"""Pattern: thin page that delegates to module function."""
st.set_page_config(
page_title="Stemwijzer",
page_icon="🗳",
layout="centered",
)
# Delegate to main module
from explorer import build_mp_quiz_tab
build_mp_quiz_tab("data/motions.db")
# =============================================================================
# Example 3: Session state initialization
# =============================================================================
def init_session_state():
"""Pattern: Initialize all session state at start."""
defaults = {
"session_id": None,
"current_motion_index": 0,
"motions": [],
"show_results": False,
"user_votes": {},
}
for key, default in defaults.items():
if key not in st.session_state:
st.session_state[key] = default
# =============================================================================
# Example 4: Sidebar configuration
# =============================================================================
def render_sidebar():
"""Pattern: Sidebar for configuration."""
with st.sidebar:
st.header("Instellingen")
motion_count = st.slider(
"Aantal moties",
min_value=5,
max_value=25,
value=10,
help="Hoeveel moties wilt u beantwoorden?",
)
policy_area = st.selectbox(
"Beleidsgebied",
[
"Alle",
"Economie",
"Klimaat",
"Immigratie",
"Zorg",
"Onderwijs",
"Defensie",
"Sociale Zaken",
"Algemeen",
],
)
margin_range = st.slider(
"Controversiële moties (%)",
min_value=0,
max_value=100,
value=(0, 100),
help="Filter op hoe omstreden de moties zijn",
)
st.divider()
if st.button("Start Nieuwe Sessie", type="primary"):
return {
"motion_count": motion_count,
"policy_area": policy_area,
"margin_range": margin_range,
}
return None
# =============================================================================
# Example 5: Motion voting interface
# =============================================================================
def render_motion_vote(motion: dict, index: int, total: int):
"""Pattern: Display motion and voting buttons."""
st.subheader(f"Motie {index + 1} van {total}")
# Motion content
st.markdown(f"### {motion['title']}")
col1, col2 = st.columns([3, 1])
with col1:
if motion.get("layman_explanation"):
st.info(motion["layman_explanation"])
with st.expander("Meer details"):
st.markdown(f"**Datum:** {motion.get('date', 'Onbekend')}")
st.markdown(f"**Beleidsgebied:** {motion.get('policy_area', 'Onbekend')}")
if motion.get("description"):
st.markdown(f"**Beschrijving:** {motion['description']}")
with col2:
st.metric(
label="Winstmarge",
value=f"{motion.get('winning_margin', 0):.0%}",
delta="Omstreden" if motion.get("controversy_score", 0) > 0.5 else "Helder",
)
st.divider()
# Voting buttons
col1, col2, col3 = st.columns(3)
with col1:
st.button(
"👍 **Voor**",
on_click=on_vote,
args=(motion["id"], "Voor"),
use_container_width=True,
)
with col2:
st.button(
"👎 **Tegen**",
on_click=on_vote,
args=(motion["id"], "Tegen"),
use_container_width=True,
)
with col3:
st.button(
"🤔 **Onthouden**",
on_click=on_vote,
args=(motion["id"], "Onthouden"),
use_container_width=True,
)
def on_vote(motion_id: int, vote: str):
"""Callback when user votes."""
# Record vote
from database import db
db.record_vote(
session_id=st.session_state.session_id, motion_id=motion_id, vote=vote
)
# Update session state
st.session_state.user_votes[motion_id] = vote
# Move to next or show results
if st.session_state.current_motion_index < len(st.session_state.motions) - 1:
st.session_state.current_motion_index += 1
else:
st.session_state.show_results = True
st.rerun()
# =============================================================================
# Example 6: Results display
# =============================================================================
def render_results():
"""Pattern: Display voting results."""
from database import db
st.header("📊 Uw Resultaten")
# Get party results
results = db.get_party_results(st.session_state.session_id)
if not results:
st.warning("Geen resultaten beschikbaar")
return
# Sort by agreement
sorted_results = sorted(
results.items(), key=lambda x: x[1].get("agreement_percentage", 0), reverse=True
)
# Display top match
if sorted_results:
top_party, top_data = sorted_results[0]
st.success(
f"**Uw beste match:** {top_party} ({top_data.get('agreement_percentage', 0):.0%} overeenstemming)"
)
st.divider()
# Show all parties
for party, data in sorted_results:
agreement = data.get("agreement_percentage", 0)
col1, col2 = st.columns([3, 1])
with col1:
st.markdown(f"**{party}**")
st.progress(agreement, text=f"{agreement:.0%}")
with col2:
st.metric("Overeenstemming", f"{agreement:.0%}")
# Detailed breakdown
with st.expander("Details per motie"):
for motion in st.session_state.motions:
user_vote = st.session_state.user_votes.get(motion["id"], "?")
st.markdown(f"- **{motion['title']}**: U={user_vote}")
# =============================================================================
# Example 7: Tabs layout
# =============================================================================
def render_tabs_example():
"""Pattern: Use tabs for organizing content."""
tab1, tab2, tab3 = st.tabs(["Compass", "Trajectories", "Zoeken"])
with tab1:
st.subheader("Politiek Kompas")
st.write("Visualiseer partijposities in 2D ruimte")
# Add compass chart...
with tab2:
st.subheader("Partij Trajectories")
st.write("Bekijk hoe partijen door de tijd bewegen")
# Add trajectory chart...
with tab3:
st.subheader("Zoek Moties")
query = st.text_input("Zoekterm")
if query:
# Search functionality...
st.write(f"Zoeken naar: {query}")
if __name__ == "__main__":
# Demo rendering
init_session_state()
st.write("Streamlit page structure example")

@ -0,0 +1,265 @@
# API Client Patterns
## Base API Client Pattern
Using requests.Session for connection pooling:
```python
# api_client.py
import requests
from typing import Dict, List, Optional
from config import config
class TweedeKamerAPI:
def __init__(self):
self.odata_base_url = "https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0"
self.session = requests.Session()
self.session.headers.update({
"Accept": "application/json",
"User-Agent": "Dutch-Political-Compass-Tool/1.0",
})
def get_motions(
self,
start_date: datetime = None,
end_date: datetime = None,
limit: int = 500,
) -> List[Dict]:
"""Get motions with voting results using OData API."""
if not start_date:
start_date = datetime.now() - timedelta(days=730)
try:
voting_records, besluit_meta = self._get_voting_records(
start_date, end_date, limit
)
return self._process_voting_records(voting_records, besluit_meta)
except Exception as e:
print(f"Error fetching motions from API: {e}")
return []
```
## OData Pagination Pattern
Handle server-side pagination with $skip:
```python
def _get_voting_records(
self,
start_date: datetime,
end_date: datetime = None,
limit: int = 50000
) -> tuple:
"""Fetch with automatic pagination."""
filter_query = (
f"GewijzigdOp ge {start_date.strftime('%Y-%m-%d')}T00:00:00Z"
" and StemmingsSoort ne null"
" and Verwijderd eq false"
)
page_size = 250 # API caps $top at 250
base_url = f"{self.odata_base_url}/Besluit"
base_params = {
"$filter": filter_query,
"$top": page_size,
"$expand": "Stemming",
"$orderby": "GewijzigdOp desc",
}
all_records = []
skip = 0
while len(all_records) < limit:
params = {**base_params, "$skip": skip}
response = self.session.get(
base_url,
params=params,
timeout=config.API_TIMEOUT
)
response.raise_for_status()
data = response.json()
besluit_page = data.get("value", [])
if not besluit_page:
break
# Process page
for besluit in besluit_page:
all_records.extend(self._extract_votes(besluit))
skip += page_size
return all_records
```
## Retry with Backoff Pattern
For transient failures:
```python
# ai_provider.py
import time
import random
from requests.exceptions import ConnectionError
def _post_with_retries(
path: str,
json: dict,
retries: int = 3
) -> requests.Response:
"""POST with exponential backoff retry."""
backoff = 0.5
for attempt in range(1, retries + 1):
try:
resp = requests.post(url, json=json, headers=headers, timeout=10)
# Handle rate limiting
if resp.status_code == 429:
if attempt == retries:
raise ProviderError("Rate limited")
retry_after = resp.headers.get("Retry-After")
if retry_after:
time.sleep(int(retry_after))
else:
sleep = backoff * (2 ** (attempt - 1))
sleep += random.uniform(0, sleep * 0.1)
time.sleep(sleep)
continue
# Handle server errors
if 500 <= resp.status_code < 600:
if attempt == retries:
raise ProviderError(f"Server error: {resp.status_code}")
time.sleep(backoff * (2 ** (attempt - 1)))
continue
return resp
except ConnectionError as exc:
if attempt == retries:
raise ProviderError(f"Connection error: {exc}")
time.sleep(backoff * (2 ** (attempt - 1)))
raise ProviderError("Failed after retries")
```
## Batch Processing Pattern
Process items in batches to manage API limits:
```python
def get_embeddings_with_retry(
texts: List[str],
batch_size: int = 50,
retries: int = 3,
) -> List[Optional[List[float]]]:
"""Process embeddings in batches with fallback to single items."""
results = [None] * len(texts)
i = 0
while i < len(texts):
end = min(len(texts), i + batch_size)
chunk = texts[i:end]
# Try batch first
try:
emb_chunk = get_embeddings_batch(chunk)
for j, emb in enumerate(emb_chunk):
results[i + j] = emb
i = end
continue
except Exception:
pass
# Fallback: single items
for j, text in enumerate(chunk):
try:
results[i + j] = get_embedding(text)
except Exception:
results[i + j] = None
i = end
return results
```
## Response Validation Pattern
Validate API responses before processing:
```python
def _process_response(self, response: requests.Response) -> Dict:
"""Validate and parse API response."""
response.raise_for_status()
data = response.json()
if "value" not in data:
raise ValueError("Unexpected response format: missing 'value' key")
return data
def _validate_besluit(self, besluit: Dict) -> bool:
"""Check required fields exist."""
required = ["Id", "GewijzigdOp"]
return all(field in besluit for field in required)
```
## Error Handling Patterns
Always provide safe fallbacks:
```python
def safe_api_call(self, endpoint: str, params: Dict = None) -> List[Dict]:
"""Call API with error handling and fallback."""
try:
response = self.session.get(
endpoint,
params=params,
timeout=config.API_TIMEOUT
)
response.raise_for_status()
data = response.json()
return data.get("value", [])
except requests.Timeout:
_logger.warning(f"API timeout for {endpoint}")
return []
except requests.HTTPError as e:
_logger.error(f"HTTP error: {e}")
return []
except Exception as e:
_logger.error(f"API call failed: {e}")
return []
```
## Session Management
Reuse session for connection pooling:
```python
class TweedeKamerAPI:
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
"Accept": "application/json",
"User-Agent": "Dutch-Political-Compass-Tool/1.0",
})
def close(self):
"""Clean up session when done."""
self.session.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
# Usage
with TweedeKamerAPI() as api:
motions = api.get_motions(start_date)
```

@ -0,0 +1,230 @@
# Architectural Patterns
## Repository Pattern
The `MotionDatabase` class acts as a repository, encapsulating all database operations behind a clean interface.
```python
# database.py
class MotionDatabase:
def __init__(self, db_path: str = config.DATABASE_PATH):
self.db_path = db_path
self._init_database()
def get_motion(self, motion_id: int) -> Optional[Dict]:
"""Get a single motion by ID."""
conn = duckdb.connect(self.db_path)
try:
result = conn.execute(
"SELECT * FROM motions WHERE id = ?", (motion_id,)
).fetchone()
return result
finally:
conn.close()
def get_filtered_motions(
self,
policy_area: str = "Alle",
min_margin: float = 0.0,
max_margin: float = 1.0,
limit: int = 10
) -> List[Dict]:
"""Get filtered list of motions."""
...
```
**Usage**: Import the singleton instance for all DB operations.
```python
from database import db
motions = db.get_filtered_motions(policy_area="Klimaat", limit=20)
```
## Facade Pattern
Simplified interfaces over complex subsystems.
### MotionDatabase Facade
```python
# Single entry point for all database operations
db = MotionDatabase() # Singleton instance
# Operations are abstracted:
db.create_session(total_motions)
db.record_vote(session_id, motion_id, vote)
db.get_party_results(session_id)
```
### API Client Facade
```python
# api_client.py
class TweedeKamerAPI:
def __init__(self):
self.session = requests.Session() # Connection pooling
def get_motions(self, start_date, end_date) -> List[Dict]:
"""Simple interface hiding OData pagination details."""
voting_records, besluit_meta = self._get_voting_records(start_date, end_date)
return self._process_voting_records(voting_records, besluit_meta)
```
### MotionScraper Facade
```python
# scraper.py (if used)
class MotionScraper:
def get_motion_content(self, url: str) -> Optional[str]:
"""Extract body text from official website."""
...
```
## Pipeline Pattern
Sequential phases with explicit dependencies:
```
pipeline/run_pipeline.py
├── Phase 1: fetch_mp_metadata
│ └── pipeline/fetch_mp_metadata.py
├── Phase 2: extract_mp_votes
│ └── pipeline/extract_mp_votes.py
├── Phase 3: svd_pipeline
│ └── pipeline/svd_pipeline.py
├── Phase 4: text_pipeline (gap-fill)
│ └── pipeline/text_pipeline.py
└── Phase 5: fusion (combine SVD + text)
└── pipeline/fusion.py
```
### Phase Orchestration
```python
# pipeline/run_pipeline.py
def run(args: argparse.Namespace) -> int:
db = MotionDatabase(args.db_path)
# Phase 1: MP metadata
if not args.skip_metadata:
from pipeline.fetch_mp_metadata import fetch_mp_metadata
fetch_mp_metadata(db_path=db.db_path)
# Phase 2: Extract votes
if not args.skip_extract:
from pipeline.extract_mp_votes import extract_mp_votes
extract_mp_votes(db_path=db.db_path)
# Phase 3: SVD per window
if not args.skip_svd:
from pipeline.svd_pipeline import run_svd_pipeline
run_svd_pipeline(db, windows, args.svd_k)
# ... additional phases
```
## Strategy Pattern
Interchangeable algorithms for axis computation:
```python
# analysis/political_axis.py
def compute_political_axis(
vectors: Dict[str, np.ndarray],
method: str = "pca" # or "anchor"
) -> Tuple[np.ndarray, np.ndarray]:
"""Compute political axis using specified method.
Methods:
- 'pca': Use first principal component
- 'anchor': Use predefined anchor motions
"""
if method == "pca":
return _compute_pca_axis(vectors)
elif method == "anchor":
return _compute_anchor_axis(vectors)
```
## Visitor Pattern
External operations on data structures:
```python
# analysis/trajectory.py
def _procrustes_align_windows(
window_vecs: Dict[str, Dict[str, np.ndarray]],
min_overlap: int = 5,
) -> Dict[str, Dict[str, np.ndarray]]:
"""Align SVD vectors across windows using Procrustes rotations.
Takes the first window as reference and aligns each subsequent window
to it via orthogonal Procrustes on the set of common entities.
"""
```
## Builder Pattern
Configuration via method chaining:
```python
# CLI argument parsing
parser = argparse.ArgumentParser(description="Pipeline runner")
parser.add_argument("--db-path", default="data/motions.db")
parser.add_argument("--start-date", default=None)
parser.add_argument("--end-date", default=None)
parser.add_argument("--window-size", choices=["quarterly", "annual"], default="quarterly")
parser.add_argument("--svd-k", type=int, default=50)
```
## Decorator Pattern
Retry logic for transient failures:
```python
# pipeline/ai_provider_wrapper.py
def get_embeddings_with_retry(
texts: List[str],
retries: int = 3,
batch_size: int = 50,
) -> List[Optional[List[float]]]:
"""Return embeddings with automatic retry on failure."""
for attempt in range(1, retries + 1):
try:
return _embedder(texts, batch_size=len(texts))
except Exception as exc:
if attempt == retries:
break
time.sleep(backoff * (2 ** (attempt - 1)))
return [None] * len(texts) # Safe fallback
```
## Data Patterns
### Batch Processing
Process items in chunks to manage memory and API limits:
```python
for i in range(0, len(items), batch_size):
chunk = items[i:i + batch_size]
process_batch(chunk)
```
### Caching
Pre-compute and store expensive results:
```python
# SimilarityCache table stores computed similarities
db.get_similarity(motion_a, motion_b)
```
### Lazy Loading
Load data only when needed:
```python
class MotionDatabase:
@property
def _connection(self):
if self._conn is None:
self._conn = duckdb.connect(self.db_path)
return self._conn
```
### Vectorization
Use numpy for batch operations:
```python
vectors = np.array([v for v in entity_vectors.values()])
normalized = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)
```

@ -0,0 +1,239 @@
# DuckDB Database Patterns
## Connection Management
### Pattern 1: Short-lived per Method (Most Common)
Always create a new connection, use try/finally for cleanup:
```python
# database.py
class MotionDatabase:
def get_motion(self, motion_id: int) -> Optional[Dict]:
conn = duckdb.connect(self.db_path)
try:
result = conn.execute(
"SELECT * FROM motions WHERE id = ?",
(motion_id,)
).fetchone()
conn.close()
return result
except Exception:
conn.close()
return None
def get_filtered_motions(
self,
policy_area: str = "Alle",
min_margin: float = 0.0,
max_margin: float = 1.0,
limit: int = 10
) -> List[Dict]:
conn = duckdb.connect(self.db_path)
try:
query = """
SELECT * FROM motions
WHERE (? = 'Alle' OR policy_area = ?)
AND winning_margin BETWEEN ? AND ?
ORDER BY RANDOM()
LIMIT ?
"""
rows = conn.execute(query, (policy_area, policy_area, min_margin, max_margin, limit)).fetchall()
conn.close()
return rows
except Exception:
conn.close()
return []
```
### Pattern 2: With Statement (Cleaner)
```python
def execute_query(self, query: str, params: tuple = ()):
with duckdb.connect(self.db_path) as conn:
return conn.execute(query, params).fetchall()
```
### Pattern 3: Lazy Connection Caching
For frequently accessed connections:
```python
class MotionDatabase:
def __init__(self, db_path: str = config.DATABASE_PATH):
self.db_path = db_path
self._conn = None
@property
def connection(self):
if self._conn is None:
self._conn = duckdb.connect(self.db_path)
return self._conn
def close(self):
if self._conn:
self._conn.close()
self._conn = None
```
## Table Initialization
Create tables with proper constraints and sequences:
```python
def _init_database(self):
conn = duckdb.connect(self.db_path)
# Create sequence for auto-incrementing IDs
try:
conn.execute("CREATE SEQUENCE IF NOT EXISTS motions_id_seq START 1")
except:
pass
# Create tables
conn.execute("""
CREATE TABLE IF NOT EXISTS motions (
id INTEGER DEFAULT nextval('motions_id_seq'),
title TEXT NOT NULL,
description TEXT,
date DATE,
policy_area TEXT,
voting_results JSON,
winning_margin FLOAT,
controversy_score FLOAT,
layman_explanation TEXT,
externe_identifier TEXT,
body_text TEXT,
url TEXT UNIQUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
)
""")
# Add columns to existing tables safely
try:
conn.execute("ALTER TABLE motions ADD COLUMN IF NOT EXISTS body_text TEXT")
except Exception:
pass # Column may already exist
conn.close()
```
## JSON Column Handling
Store and retrieve JSON data:
```python
# Insert JSON
def store_motion(self, motion: Dict):
conn = duckdb.connect(self.db_path)
try:
conn.execute(
"INSERT INTO motions (title, voting_results) VALUES (?, ?)",
(motion["title"], json.dumps(motion["voting_results"]))
)
conn.close()
except Exception:
conn.close()
# Query JSON
def get_motions_with_votes(self, party: str) -> List[Dict]:
conn = duckdb.connect(self.db_path)
try:
rows = conn.execute("""
SELECT title, voting_results
FROM motions
WHERE JSON_EXTRACT(voting_results, '$.party') = ?
""", (party,)).fetchall()
conn.close()
return rows
except Exception:
conn.close()
return []
```
## Query Patterns
### Parameterized Queries (Always!)
```python
# SAFE - uses parameterized query
conn.execute("SELECT * FROM motions WHERE id = ?", (motion_id,))
# AVOID - SQL injection risk
# conn.execute(f"SELECT * FROM motions WHERE id = {motion_id}") # BAD!
```
### Batch Inserts
```python
def bulk_insert_motions(self, motions: List[Dict]):
conn = duckdb.connect(self.db_path)
try:
for motion in motions:
conn.execute(
"""INSERT OR IGNORE INTO motions
(title, date, policy_area) VALUES (?, ?, ?)""",
(motion["title"], motion["date"], motion["policy_area"])
)
conn.close()
except Exception:
conn.close()
```
### Aggregation Queries
```python
def get_party_vote_stats(self, party: str) -> Dict:
conn = duckdb.connect(self.db_path)
try:
result = conn.execute("""
SELECT
COUNT(*) as total_votes,
SUM(CASE WHEN vote = 'Voor' THEN 1 ELSE 0 END) as voor,
SUM(CASE WHEN vote = 'Tegen' THEN 1 ELSE 0 END) as tegen
FROM mp_votes
WHERE party = ?
""", (party,)).fetchone()
conn.close()
return {"total": result[0], "voor": result[1], "tegen": result[2]}
except Exception:
conn.close()
return {"total": 0, "voor": 0, "tegen": 0}
```
## Error Handling
Always close connections in finally block or with context manager:
```python
def safe_query(self, query: str, params: tuple = ()):
conn = None
try:
conn = duckdb.connect(self.db_path)
result = conn.execute(query, params).fetchall()
return result
except Exception as e:
_logger.error(f"Query failed: {e}")
return []
finally:
if conn:
conn.close()
```
## Testing with Mock
For unit tests without DuckDB:
```python
# In MotionDatabase.__init__
def __init__(self, db_path: str = config.DATABASE_PATH):
self.db_path = db_path
self._file_mode = duckdb is None
if duckdb is None:
# Create JSON fallback files
for p in (f"{db_path}.embeddings.json", f"{db_path}.similarity_cache.json"):
if not os.path.exists(p):
with open(p, "w") as fh:
fh.write("[]")
else:
self._init_database()
```

@ -0,0 +1,228 @@
# Code Patterns
## 1. Page Wrapper Pattern
Thin Streamlit page files delegate to core modules. Pages contain only route logic, not business logic.
**Example** (pages/1_🗳_Stemwijzer.py):
```python
import streamlit as st
from quiz_module import render_quiz_page
st.set_page_config(...)
render_quiz_page()
```
**Example** (pages/2_🔍_Explorer.py):
```python
import streamlit as st
from explorer import render_explorer
st.set_page_config(...)
render_explorer()
```
**Rule**: Pages should have <20 lines of logic. All complexity lives in modules.
---
## 2. Pipeline Pattern
Data flows: fetch → transform → store
**Location**: `pipeline/` directory
**Pattern**:
```python
def run_pipeline():
raw_data = fetch_from_source()
transformed = transform(raw_data)
store(transformed)
def fetch_from_source():
# API call or DB query
...
def transform(raw):
# Clean, normalize, compute derived fields
...
```
**Usage**: SVD computation pipeline, data ingestion, motion processing
---
## 3. API Client Pattern
HTTP client with retry/backoff for external data sources.
**Pattern**:
```python
import time
import requests
def fetch_with_retry(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.get(url)
response.raise_for_status()
return response.json()
except requests.RequestException:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # exponential backoff
else:
raise
```
---
## 4. Pure Helper Functions
Functions in `explorer_helpers.py` have no side effects, no IO.
**Pattern**:
```python
def compute_party_coords(svd_df, party_map, window):
"""Pure function: same inputs → same outputs, no side effects."""
# Filter, compute, return
return result_df
def build_scatter_trace(df, color_col, marker_size=8):
"""Pure: returns Plotly trace dict, no rendering."""
trace = go.Scatter(x=df.x, y=df.y, mode='markers', ...)
return trace
```
**Rule**: No `import streamlit` in helper modules. No file I/O. No global state.
---
## 5. Dummy Fallbacks for Optional Dependencies
Gracefully degrade when optional packages are unavailable.
**Pattern**:
```python
try:
import umap
HAS_UMAP = True
except ImportError:
HAS_UMAP = False
# or provide dummy stub
def project_to_2d(vectors):
if HAS_UMAP:
return umap.UMAP().fit_transform(vectors)
else:
return vectors[:, :2] # fallback: just take first 2 dims
```
**Used for**: UMAP, Plotly (with fallback to altair or text-only)
---
## 6. Cached Data Loaders
Expensive DB queries wrapped with `@st.cache_data`.
**Pattern**:
```python
@st.cache_data
def load_svd_vectors(window: str) -> pd.DataFrame:
return db.query("SELECT * FROM svd_vectors WHERE window = ?", window)
@st.cache_data
def load_party_centroids(window: str) -> pd.DataFrame:
return db.query("SELECT * FROM party_centroids WHERE window = ?", window)
# Clear cache when data updates
@st.cache_data
def load_motions(category: str | None = None) -> pd.DataFrame:
...
```
**Rule**: Use `ttl=3600` for large datasets. Use `show_spinner=False` where appropriate.
---
## 7. Plotly Dual-Layer Charts
Charts built with two traces: scatter points + text annotations.
**Pattern**:
```python
def build_dual_layer_chart(df, x_col, y_col, label_col):
# Layer 1: markers
scatter = go.Scatter(
x=df[x_col], y=df[y_col],
mode='markers',
marker=dict(size=10, color=df['color']),
name='Parties'
)
# Layer 2: labels (smaller, non-hoverable)
labels = go.Scatter(
x=df[x_col], y=df[y_col],
mode='text',
text=df[label_col],
textposition='top center',
showlegend=False
)
return [scatter, labels]
```
**Used in**: Explorer tab charts, party position plots
---
## 8. Singleton Module Instances
One shared instance per module, created at import time.
**Pattern**:
```python
# database.py
class MotionDatabase:
def __init__(self, db_path=None):
self.conn = ibis.duckdb.connect(db_path)
self._load_schema()
_db = None
def get_db():
global _db
if _db is None:
_db = MotionDatabase()
return _db
# At module bottom:
db = MotionDatabase() # singleton instance
```
**Also used in**: `config.py` exports `config` and `PARTY_COLOURS`
---
## 9. Dataclass Config Pattern
Configuration centralized in a `@dataclass`.
**Pattern**:
```python
from dataclasses import dataclass, field
@dataclass
class Config:
db_path: str = "data/stemwijzer.duckdb"
default_window: str = "2023"
cache_ttl: int = 3600
party_colours: dict = field(default_factory=lambda: PARTY_COLOURS)
def __post_init__(self):
if not Path(self.db_path).exists():
raise FileNotFoundError(f"Database not found: {self.db_path}")
```
---
## 10. Graceful Degradation with try/except
Core pattern throughout: attempt operation, fall back gracefully.
**Pattern**:
```python
def get_political_position(mp_name, window):
try:
vectors = load_svd_vectors(window)
return vectors[vectors['mp_name'] == mp_name]['vector_2d'].iloc[0]
except (KeyError, IndexError):
return [0.0, 0.0] # neutral fallback
```

@ -0,0 +1,196 @@
# Python-Specific Patterns
## Singleton Pattern
Use module-level instances for shared resources:
```python
# database.py
class MotionDatabase:
def __init__(self, db_path: str = config.DATABASE_PATH):
self.db_path = db_path
self._init_database()
def _init_database(self):
# Initialize tables on first instantiation
...
# Bottom of file - the singleton
db = MotionDatabase()
```
**Usage across the codebase:**
```python
# In other modules
from database import db
def some_function():
motions = db.get_filtered_motions(limit=10)
return motions
```
Similarly for other singletons:
```python
# summarizer.py
class MotionSummarizer:
def __init__(self):
pass # Stateless
def generate_layman_explanation(self, title: str, body: str) -> str:
...
summarizer = MotionSummarizer()
```
## Dataclass Config Pattern
Use dataclass for configuration with environment variable support:
```python
# config.py
from dataclasses import dataclass
from typing import List
import os
@dataclass
class Config:
# Database settings
DATABASE_PATH = "data/motions.db"
# API settings
TWEEDE_KAMER_ODATA_API = "https://gegevensmagazijn.tweedekamer.nl/OData/v4/2.0"
API_TIMEOUT = 30
API_BATCH_SIZE = 250
# AI settings
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
QWEN_MODEL = "qwen/qwen-2.5-72b-instruct"
# App settings
DEFAULT_MOTION_COUNT = 10
SESSION_TIMEOUT_DAYS = 30
# Policy areas
POLICY_AREAS: List[str] = None
def __post_init__(self):
self.POLICY_AREAS = [
"Alle", "Economie", "Klimaat", "Immigratie",
"Zorg", "Onderwijs", "Defensie", "Sociale Zaken", "Algemeen"
]
config = Config()
```
**Usage:**
```python
from config import config
# Access as attributes
timeout = config.API_TIMEOUT
areas = config.POLICY_AREAS
```
## DuckDB Connection Pattern
Short-lived connections with explicit cleanup:
```python
class MotionDatabase:
def get_motion(self, motion_id: int) -> Optional[Dict]:
conn = duckdb.connect(self.db_path)
try:
result = conn.execute(
"SELECT * FROM motions WHERE id = ?",
(motion_id,)
).fetchone()
return result
finally:
conn.close()
def get_filtered_motions(self, **kwargs) -> List[Dict]:
conn = duckdb.connect(self.db_path)
try:
rows = conn.execute(query, params).fetchall()
return rows
except Exception:
return [] # Safe fallback
finally:
conn.close()
```
**Context manager alternative (preferred when applicable):**
```python
def some_operation(self):
with duckdb.connect(self.db_path) as conn:
result = conn.execute("SELECT ...").fetchall()
return result
```
## Try/Except with Fallback Pattern
Always provide safe fallbacks:
```python
def get_motion_or_default(self, motion_id: int) -> Dict:
try:
conn = duckdb.connect(self.db_path)
result = conn.execute("SELECT * FROM motions WHERE id = ?", (motion_id,)).fetchone()
conn.close()
return result if result else {}
except Exception:
return {}
```
## Optional Import Pattern
Handle optional dependencies gracefully:
```python
try:
import duckdb
except Exception: # pragma: no cover
duckdb = None
class MotionDatabase:
def __init__(self, db_path: str = config.DATABASE_PATH):
self._file_mode = duckdb is None
...
```
## Property Pattern
Lazy initialization of expensive resources:
```python
class MotionDatabase:
def __init__(self, db_path: str = config.DATABASE_PATH):
self.db_path = db_path
self._session_cache = None
@property
def session(self):
"""Lazy-load expensive resources."""
if self._session_cache is None:
self._session_cache = self._create_session()
return self._session_cache
```
## Type Annotation Patterns
```python
from typing import Dict, List, Optional, Tuple, Any
# Optional with None default
def get_motion(self, motion_id: Optional[int] = None) -> Optional[Dict]:
...
# Multiple return types
def parse_vote(self, vote_str: str) -> Tuple[bool, str]:
"""Returns (success, error_message)"""
...
# Generic types
def get_batch(self, ids: List[int]) -> Dict[str, Any]:
...
```

@ -0,0 +1,225 @@
# Streamlit Patterns
## Session State Initialization
Always initialize session state at the start of the main function:
```python
# app.py
import streamlit as st
def main():
# Initialize all session state variables
if "session_id" not in st.session_state:
st.session_state.session_id = None
if "current_motion_index" not in st.session_state:
st.session_state.current_motion_index = 0
if "motions" not in st.session_state:
st.session_state.motions = []
if "show_results" not in st.session_state:
st.session_state.show_results = False
# Rest of app...
```
## Page Configuration
Set page config at the top of each page file:
```python
# pages/1_Stemwijzer.py
import streamlit as st
st.set_page_config(
page_title="Stemwijzer",
page_icon="🗳",
layout="centered",
)
from explorer import build_mp_quiz_tab
build_mp_quiz_tab("data/motions.db")
```
## Thin Page Wrapper Pattern
Pages delegate to shared functions in main modules:
```python
# pages/2_Explorer.py
import streamlit as st
st.set_page_config(
page_title="Explorer",
page_icon="🔭",
layout="wide",
)
from explorer import build_explorer_tab
build_explorer_tab()
```
```python
# explorer.py
def build_explorer_tab():
st.header("🔭 Politiek Explorer")
tab1, tab2, tab3 = st.tabs([
"Compass",
"Trajectories",
"Zoeken"
])
with tab1:
render_compass()
with tab2:
render_trajectories()
with tab3:
render_search()
```
## Sidebar Pattern
Use sidebar for configuration and navigation:
```python
# app.py
def main():
with st.sidebar:
st.header("Instellingen")
motion_count = st.slider(
"Aantal moties",
min_value=5,
max_value=25,
value=10,
)
policy_area = st.selectbox("Beleidsgebied", config.POLICY_AREAS)
if st.button("Start Nieuwe Sessie"):
start_new_session(motion_count, policy_area)
```
## Callback Pattern for State Updates
Use callbacks to handle user interactions:
```python
def on_motion_vote(motion_id: int, vote: str):
"""Callback when user votes on a motion."""
st.session_state.user_votes[motion_id] = vote
# Move to next motion
if st.session_state.current_motion_index < len(st.session_state.motions) - 1:
st.session_state.current_motion_index += 1
else:
st.session_state.show_results = True
st.rerun()
# In UI
col1, col2, col3 = st.columns(3)
with col1:
st.button("👍 Voor", on_click=on_motion_vote, args=(motion_id, "Voor"))
with col2:
st.button("👎 Tegen", on_click=on_motion_vote, args=(motion_id, "Tegen"))
with col3:
st.button("❓ Onthouden", on_click=on_motion_vote, args=(motion_id, "Onthouden"))
```
## Container Pattern for Dynamic Content
Use containers for dynamic rendering:
```python
def show_motion_interface():
if not st.session_state.motions:
st.warning("Geen moties geladen")
return
current_idx = st.session_state.current_motion_index
motion = st.session_state.motions[current_idx]
with st.container():
st.subheader(f"Motie {current_idx + 1} van {len(st.session_state.motions)}")
st.markdown(f"**{motion['title']}**")
st.caption(f"📅 {motion['date']} | 🏷 {motion['policy_area']}")
if motion.get("layman_explanation"):
st.info(motion["layman_explanation"])
# Voting buttons...
```
## Expander Pattern for Details
Use expanders for collapsible content:
```python
with st.expander("Meer details"):
st.markdown(f"**Beschrijving:** {motion.get('description', 'N/A')}")
if motion.get("voting_results"):
results = json.loads(motion["voting_results"])
st.json(results)
```
## Form Pattern for Batch Updates
Use forms for multiple related inputs:
```python
with st.form("session_settings"):
st.subheader("Sessie Instellingen")
col1, col2 = st.columns(2)
with col1:
count = st.number_input("Aantal moties", min_value=5, max_value=25)
with col2:
area = st.selectbox("Beleidsgebied", config.POLICY_AREAS)
submitted = st.form_submit_button("Start Sessie")
if submitted:
start_session(count, area)
```
## Caching Pattern
Cache expensive computations:
```python
@st.cache_data(ttl=3600) # Cache for 1 hour
def load_party_positions(window_id: str) -> Dict:
"""Load party positions from database."""
return db.get_party_positions(window_id)
@st.cache_resource
def init_database():
"""Initialize database connection."""
return MotionDatabase(config.DATABASE_PATH)
```
## Home Page Pattern
Landing page with navigation:
```python
# Home.py
import streamlit as st
st.set_page_config(
page_title="Motief: de stematlas",
page_icon="🗺",
layout="centered",
)
def main():
st.title("🗺 Motief: de stematlas")
st.markdown("**Motief** brengt de Nederlandse Tweede Kamer in kaart...")
col1, col2 = st.columns(2)
with col1:
st.page_link("pages/1_Stemwijzer.py", label="Open Stemwijzer", icon="🗳")
with col2:
st.page_link("pages/2_Explorer.py", label="Open Explorer", icon="🔭")
```

@ -0,0 +1,41 @@
# Tech Stack
## Runtime & Language
- **Python ≥3.13** (type: runtime)
- Streamlit (type: web framework) - multi-page app: Home, Stemwijzer, Explorer (4 tabs)
## Data Layer
- **DuckDB** (type: database) - 9 tables: motions, mp_votes, svd_vectors, mp_party_history, etc.
- **ibis** (type: ORM) - DuckDB backend for Pythonic SQL
- Query mode: duckdb:// path or :memory: (see database.py:50-51)
## ML / Analytics
- **scikit-learn** (type: ML) - clustering, Procrustes alignment
- **UMAP** (type: dimensionality reduction) - 2D political compass projection
- **scipy** (type: scientific computing) - spatial/alignment algorithms
- **numpy** (type: numerical computing) - array operations
## Visualization
- **Plotly** (type: charting) - dual-layer interactive charts (scatter + annotations)
## Key Source Files
| File | Purpose |
|------|---------|
| `database.py` | MotionDatabase singleton, DuckDB connection, 9-table schema |
| `explorer.py` | Explorer page with 4 tabs (Motion, MP, Party, Evolution) |
| `explorer_helpers.py` | Pure helper functions, Plotly chart builders, coordinate computation |
| `analysis/` | SVD pipeline, UMAP projection, clustering algorithms |
| `pipeline/` | Data fetch → transform → store pipeline |
| `pages/1_🗳_Stemwijzer.py` | Quiz page (thin wrapper) |
| `pages/2_🔍_Explorer.py` | Explorer page (thin wrapper) |
| `config.py` | Dataclass Config pattern |
## Database Tables
- `motions` - parliamentary motions with id, title, date, category
- `mp_votes` - individual MP votes on motions (1/0/-1)
- `svd_vectors` - SVD-computed political positions (entity_id, window, vector_2d)
- `mp_party_history` - MP-to-party mappings over time
- `party_centroids` - aggregated party positions
- `windows` - time period definitions
- `mp_trajectories` - MP position changes across windows
- Plus 2 additional tables (exact names vary)
Loading…
Cancel
Save