|
|
# System Overview
|
|
|
|
|
|
## Project: Stemwijzer (Dutch Political Voting Compass)
|
|
|
|
|
|
**Purpose**: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views.
|
|
|
|
|
|
## Architecture Summary
|
|
|
|
|
|
### Data Flow
|
|
|
```
|
|
|
TweedeKamer OData API
|
|
|
↓
|
|
|
API Client (api_client.py)
|
|
|
↓
|
|
|
DuckDB Database (database.py)
|
|
|
↓
|
|
|
Pipeline Processing (pipeline/)
|
|
|
├── fetch_mp_metadata # MP party + tenure
|
|
|
├── extract_mp_votes # voting_results → mp_votes
|
|
|
├── svd_pipeline # SVD on vote matrix + Procrustes
|
|
|
├── text_pipeline # AI embeddings via OpenRouter
|
|
|
└── fusion # Combine SVD + text vectors
|
|
|
↓
|
|
|
Streamlit Web App (Home.py, pages/)
|
|
|
├── Home.py # Landing page
|
|
|
├── 1_Stemwijzer.py # Voting quiz
|
|
|
└── 2_Explorer.py # Political compass explorer
|
|
|
```
|
|
|
|
|
|
### Key Components
|
|
|
|
|
|
| Component | Purpose | File(s) |
|
|
|
|-----------|---------|---------|
|
|
|
| **Database** | Motion storage, MP votes, embeddings | `database.py` |
|
|
|
| **API Client** | TweedeKamer OData API integration | `api_client.py` |
|
|
|
| **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` |
|
|
|
| **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` |
|
|
|
| **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` |
|
|
|
| **Explorer Helpers** | Pure functions, chart builders | `explorer_helpers.py` |
|
|
|
| **Web App** | Streamlit UI | `Home.py`, `pages/*.py` |
|
|
|
|
|
|
### Tech Stack
|
|
|
|
|
|
- **Language**: Python 3.13+
|
|
|
- **Web Framework**: Streamlit (multi-page app)
|
|
|
- **Database**: DuckDB with ibis ORM (DuckDB-native implementation)
|
|
|
- **ML/Analytics**: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional)
|
|
|
- **AI/LLM**: OpenRouter-compatible API (QWEN embeddings + chat)
|
|
|
- **Visualization**: Plotly (interactive charts), matplotlib (optional)
|
|
|
- **HTTP**: requests with Session pooling and retry
|
|
|
- **Parsing**: beautifulsoup4, lxml
|
|
|
|
|
|
### Key Patterns
|
|
|
|
|
|
1. **Module-Level Singletons**: `db = MotionDatabase()`, `config = Config()`
|
|
|
2. **Repository Pattern**: MotionDatabase class with method-per-query
|
|
|
3. **Service Layer**: TweedeKamerAPI, ai_provider with retry/backoff
|
|
|
4. **Pipeline Orchestration**: ThreadPoolExecutor for parallel SVD
|
|
|
5. **Short-Lived Connections**: DuckDB connections in try/finally blocks
|
|
|
6. **Graceful Degradation**: try/except around optional dependencies
|
|
|
|
|
|
### Domain Invariants
|
|
|
|
|
|
⚠️ **CRITICAL RULES** (from AGENTS.md):
|
|
|
|
|
|
1. **Right-wing parties on RIGHT**: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations
|
|
|
2. **SVD labels = voting patterns**: SVD labels reflect voting patterns, NOT semantic content
|
|
|
|
|
|
### Database Tables
|
|
|
|
|
|
| Table | Purpose |
|
|
|
|-------|---------|
|
|
|
| `motions` | Parliamentary motions with id, title, date, category |
|
|
|
| `mp_votes` | Individual MP votes on motions (Voor/Tegen/Onthouden) |
|
|
|
| `mp_metadata` | MP names, parties, tenure info |
|
|
|
| `svd_vectors` | 2D SVD-computed political positions per entity |
|
|
|
| `fused_embeddings` | Combined SVD + text embeddings |
|
|
|
| `embeddings` | Text embeddings for motions |
|
|
|
| `user_sessions` | Voting session tracking |
|
|
|
| `party_results` | Party match results per session |
|
|
|
|
|
|
### Conventions
|
|
|
|
|
|
- **Error Handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
|
|
|
- **Logging**: Use `logging.getLogger(__name__)` — **never use print()**
|
|
|
- **Imports**: stdlib → 3rd party → local (3 groups)
|
|
|
- **Type Hints**: Required on public functions with typing module imports
|
|
|
- **DuckDB**: Short-lived connections with try/finally conn.close()
|
|
|
|