# System Overview ## Project: Stemwijzer (Dutch Political Voting Compass) **Purpose**: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views. ## Architecture Summary ### Data Flow ``` TweedeKamer OData API ↓ API Client (api_client.py) ↓ DuckDB Database (database.py) ↓ Pipeline Processing (pipeline/) ├── fetch_mp_metadata # MP party + tenure ├── extract_mp_votes # voting_results → mp_votes ├── svd_pipeline # SVD on vote matrix + Procrustes ├── text_pipeline # AI embeddings via OpenRouter └── fusion # Combine SVD + text vectors ↓ Streamlit Web App (Home.py, pages/) ├── Home.py # Landing page ├── 1_Stemwijzer.py # Voting quiz └── 2_Explorer.py # Political compass explorer ``` ### Key Components | Component | Purpose | File(s) | |-----------|---------|---------| | **Database** | Motion storage, MP votes, embeddings | `database.py` | | **API Client** | TweedeKamer OData API integration | `api_client.py` | | **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` | | **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` | | **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` | | **Explorer Helpers** | Pure functions, chart builders | `explorer_helpers.py` | | **Web App** | Streamlit UI | `Home.py`, `pages/*.py` | ### Tech Stack - **Language**: Python 3.13+ - **Web Framework**: Streamlit (multi-page app) - **Database**: DuckDB with ibis ORM (DuckDB-native implementation) - **ML/Analytics**: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional) - **AI/LLM**: OpenRouter-compatible API (QWEN embeddings + chat) - **Visualization**: Plotly (interactive charts), matplotlib (optional) - **HTTP**: requests with Session pooling and retry - **Parsing**: beautifulsoup4, lxml ### Key Patterns 1. **Module-Level Singletons**: `db = MotionDatabase()`, `config = Config()` 2. **Repository Pattern**: MotionDatabase class with method-per-query 3. **Service Layer**: TweedeKamerAPI, ai_provider with retry/backoff 4. **Pipeline Orchestration**: ThreadPoolExecutor for parallel SVD 5. **Short-Lived Connections**: DuckDB connections in try/finally blocks 6. **Graceful Degradation**: try/except around optional dependencies ### Domain Invariants ⚠️ **CRITICAL RULES** (from AGENTS.md): 1. **Right-wing parties on RIGHT**: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations 2. **SVD labels = voting patterns**: SVD labels reflect voting patterns, NOT semantic content ### Database Tables | Table | Purpose | |-------|---------| | `motions` | Parliamentary motions with id, title, date, category | | `mp_votes` | Individual MP votes on motions (Voor/Tegen/Onthouden) | | `mp_metadata` | MP names, parties, tenure info | | `svd_vectors` | 2D SVD-computed political positions per entity | | `fused_embeddings` | Combined SVD + text embeddings | | `embeddings` | Text embeddings for motions | | `user_sessions` | Voting session tracking | | `party_results` | Party match results per session | ### Conventions - **Error Handling**: Catch `Exception`, return safe fallbacks (False/[]/None) - **Logging**: Use `logging.getLogger(__name__)` — **never use print()** - **Imports**: stdlib → 3rd party → local (3 groups) - **Type Hints**: Required on public functions with typing module imports - **DuckDB**: Short-lived connections with try/finally conn.close()