You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.7 KiB
3.7 KiB
System Overview
Project: Stemwijzer (Dutch Political Voting Compass)
Purpose: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views.
Architecture Summary
Data Flow
TweedeKamer OData API
↓
API Client (api_client.py)
↓
DuckDB Database (database.py)
↓
Pipeline Processing (pipeline/)
├── fetch_mp_metadata # MP party + tenure
├── extract_mp_votes # voting_results → mp_votes
├── svd_pipeline # SVD on vote matrix + Procrustes
├── text_pipeline # AI embeddings via OpenRouter
└── fusion # Combine SVD + text vectors
↓
Streamlit Web App (Home.py, pages/)
├── Home.py # Landing page
├── 1_Stemwijzer.py # Voting quiz
└── 2_Explorer.py # Political compass explorer
Key Components
| Component | Purpose | File(s) |
|---|---|---|
| Database | Motion storage, MP votes, embeddings | database.py |
| API Client | TweedeKamer OData API integration | api_client.py |
| AI Provider | OpenRouter API for embeddings/summaries | ai_provider.py |
| Pipeline | Orchestrated data processing | pipeline/run_pipeline.py |
| Analysis | SVD, clustering, trajectory computation | analysis/*.py |
| Explorer Helpers | Pure functions, chart builders | explorer_helpers.py |
| Web App | Streamlit UI | Home.py, pages/*.py |
Tech Stack
- Language: Python 3.13+
- Web Framework: Streamlit (multi-page app)
- Database: DuckDB with ibis ORM (DuckDB-native implementation)
- ML/Analytics: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional)
- AI/LLM: OpenRouter-compatible API (QWEN embeddings + chat)
- Visualization: Plotly (interactive charts), matplotlib (optional)
- HTTP: requests with Session pooling and retry
- Parsing: beautifulsoup4, lxml
Key Patterns
- Module-Level Singletons:
db = MotionDatabase(),config = Config() - Repository Pattern: MotionDatabase class with method-per-query
- Service Layer: TweedeKamerAPI, ai_provider with retry/backoff
- Pipeline Orchestration: ThreadPoolExecutor for parallel SVD
- Short-Lived Connections: DuckDB connections in try/finally blocks
- Graceful Degradation: try/except around optional dependencies
Domain Invariants
⚠️ CRITICAL RULES (from AGENTS.md):
- Right-wing parties on RIGHT: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations
- SVD labels = voting patterns: SVD labels reflect voting patterns, NOT semantic content
Database Tables
| Table | Purpose |
|---|---|
motions |
Parliamentary motions with id, title, date, category |
mp_votes |
Individual MP votes on motions (Voor/Tegen/Onthouden) |
mp_metadata |
MP names, parties, tenure info |
svd_vectors |
2D SVD-computed political positions per entity |
fused_embeddings |
Combined SVD + text embeddings |
embeddings |
Text embeddings for motions |
user_sessions |
Voting session tracking |
party_results |
Party match results per session |
Conventions
- Error Handling: Catch
Exception, return safe fallbacks (False/[]/None) - Logging: Use
logging.getLogger(__name__)— never use print() - Imports: stdlib → 3rd party → local (3 groups)
- Type Hints: Required on public functions with typing module imports
- DuckDB: Short-lived connections with try/finally conn.close()