You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/.mindmodel/system.md

88 lines
3.7 KiB

# System Overview
## Project: Stemwijzer (Dutch Political Voting Compass)
**Purpose**: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views.
## Architecture Summary
### Data Flow
```
TweedeKamer OData API
API Client (api_client.py)
DuckDB Database (database.py)
Pipeline Processing (pipeline/)
├── fetch_mp_metadata # MP party + tenure
├── extract_mp_votes # voting_results → mp_votes
├── svd_pipeline # SVD on vote matrix + Procrustes
├── text_pipeline # AI embeddings via OpenRouter
└── fusion # Combine SVD + text vectors
Streamlit Web App (Home.py, pages/)
├── Home.py # Landing page
├── 1_Stemwijzer.py # Voting quiz
└── 2_Explorer.py # Political compass explorer
```
### Key Components
| Component | Purpose | File(s) |
|-----------|---------|---------|
| **Database** | Motion storage, MP votes, embeddings | `database.py` |
| **API Client** | TweedeKamer OData API integration | `api_client.py` |
| **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` |
| **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` |
| **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` |
| **Explorer Helpers** | Pure functions, chart builders | `explorer_helpers.py` |
| **Web App** | Streamlit UI | `Home.py`, `pages/*.py` |
### Tech Stack
- **Language**: Python 3.13+
- **Web Framework**: Streamlit (multi-page app)
- **Database**: DuckDB with ibis ORM (DuckDB-native implementation)
- **ML/Analytics**: scipy (SVD, Procrustes), scikit-learn (KMeans, cosine_similarity), umap-learn (optional)
- **AI/LLM**: OpenRouter-compatible API (QWEN embeddings + chat)
- **Visualization**: Plotly (interactive charts), matplotlib (optional)
- **HTTP**: requests with Session pooling and retry
- **Parsing**: beautifulsoup4, lxml
### Key Patterns
1. **Module-Level Singletons**: `db = MotionDatabase()`, `config = Config()`
2. **Repository Pattern**: MotionDatabase class with method-per-query
3. **Service Layer**: TweedeKamerAPI, ai_provider with retry/backoff
4. **Pipeline Orchestration**: ThreadPoolExecutor for parallel SVD
5. **Short-Lived Connections**: DuckDB connections in try/finally blocks
6. **Graceful Degradation**: try/except around optional dependencies
### Domain Invariants
**CRITICAL RULES** (from AGENTS.md):
1. **Right-wing parties on RIGHT**: PVV, FVD, JA21, SGP must appear on RIGHT side of all axes in visualizations
2. **SVD labels = voting patterns**: SVD labels reflect voting patterns, NOT semantic content
### Database Tables
| Table | Purpose |
|-------|---------|
| `motions` | Parliamentary motions with id, title, date, category |
| `mp_votes` | Individual MP votes on motions (Voor/Tegen/Onthouden) |
| `mp_metadata` | MP names, parties, tenure info |
| `svd_vectors` | 2D SVD-computed political positions per entity |
| `fused_embeddings` | Combined SVD + text embeddings |
| `embeddings` | Text embeddings for motions |
| `user_sessions` | Voting session tracking |
| `party_results` | Party match results per session |
### Conventions
- **Error Handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
- **Logging**: Use `logging.getLogger(__name__)`**never use print()**
- **Imports**: stdlib → 3rd party → local (3 groups)
- **Type Hints**: Required on public functions with typing module imports
- **DuckDB**: Short-lived connections with try/finally conn.close()