You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
69 lines
2.9 KiB
69 lines
2.9 KiB
# System Overview
|
|
|
|
## Project: Stemwijzer (Dutch Political Voting Compass)
|
|
|
|
**Purpose**: A web application that maps the Dutch Tweede Kamer (House of Representatives) based on real parliamentary votes, helping citizens discover which political party aligns best with their views.
|
|
|
|
## Architecture Summary
|
|
|
|
### Data Flow
|
|
```
|
|
TweedeKamer OData API
|
|
↓
|
|
API Client (api_client.py)
|
|
↓
|
|
DuckDB Database (database.py)
|
|
↓
|
|
Pipeline Processing (pipeline/)
|
|
├── fetch_mp_metadata # MP party + tenure
|
|
├── extract_mp_votes # voting_results → mp_votes
|
|
├── svd_pipeline # SVD on vote matrix + Procrustes
|
|
├── text_pipeline # AI embeddings via OpenRouter
|
|
└── fusion # Combine SVD + text vectors
|
|
↓
|
|
Streamlit Web App (app.py, pages/)
|
|
├── Home.py # Landing page
|
|
├── 1_Stemwijzer.py # Voting quiz
|
|
└── 2_Explorer.py # Political compass explorer
|
|
```
|
|
|
|
### Key Components
|
|
|
|
| Component | Purpose | File(s) |
|
|
|-----------|---------|---------|
|
|
| **Database** | Motion storage, MP votes, embeddings | `database.py` |
|
|
| **API Client** | TweedeKamer OData API integration | `api_client.py` |
|
|
| **AI Provider** | OpenRouter API for embeddings/summaries | `ai_provider.py` |
|
|
| **Pipeline** | Orchestrated data processing | `pipeline/run_pipeline.py` |
|
|
| **Analysis** | SVD, clustering, trajectory computation | `analysis/*.py` |
|
|
| **Similarity** | Motion similarity search | `similarity/*.py` |
|
|
| **Web App** | Streamlit UI | `app.py`, `pages/*.py` |
|
|
|
|
### Data Models
|
|
|
|
**Core Entities**:
|
|
- `Motion`: Parliamentary motion with voting results
|
|
- `MP` / `MPMetadata`: Member of Parliament with party/tenure
|
|
- `MPVote`: Individual vote record (Voor/Tegen/Onthouden/Geen stem/Afwezig)
|
|
- `Party`: Political party
|
|
- `UserSession` / `UserVote`: Voting session tracking
|
|
- `SVDVector`: Dimensionality-reduced vote vectors
|
|
- `FusedEmbedding`: Combined SVD + text embedding
|
|
- `SimilarityCache`: Pre-computed motion similarities
|
|
|
|
### Technical Decisions
|
|
|
|
1. **DuckDB over SQLite**: Chosen for OLAP performance with complex analytical queries
|
|
2. **ibis ORM**: Database-agnostic query building (currently using DuckDB backend)
|
|
3. **SVD + Procrustes**: Aligns voting vectors across time windows
|
|
4. **UMAP for visualization**: Non-linear dimensionality reduction for compass display
|
|
5. **OpenRouter API**: Abstraction layer for AI embeddings (currently using Qwen)
|
|
6. **Module-level singletons**: `db = MotionDatabase()` pattern for shared state
|
|
|
|
### Key Conventions
|
|
|
|
- **DuckDB connections**: Short-lived per method, always close
|
|
- **Error handling**: Catch `Exception`, return safe fallbacks (False/[]/None)
|
|
- **Logging**: Use `logging.getLogger(__name__)` - avoid print()
|
|
- **Type hints**: Required on public functions with typing module imports
|
|
- **Config**: Dataclass `Config` in `config.py`, accessed as `from config import config`
|
|
|