--- title: ARCHITECTURE --- ## Overview - Small Python project that collects, stores and presents Dutch parliamentary motions (Tweede Kamer). It ingests votes via OData API, stores motions in a DuckDB file, generates short human-readable summaries using an LLM client, and exposes a Streamlit UI for users to vote and view matching results. ## Tech stack - Language: Python (single-project repository) - Data: DuckDB (file: data/motions.db) - Web / UI: Streamlit (app.py, pages/) - HTTP: requests (ai_provider.py, api_client.py) - LLM: QWEN (via OpenRouter) / OpenAI-compatible client (ai_provider.py). Prefer QWEN via OpenRouter where possible. - Analysis: scipy (SVD), scikit-learn (clustering), umap-learn (dimensionality reduction) - Visualization: Plotly - Packaging: pyproject.toml ## Top-level layout (annotated) ./ - app.py — Streamlit UI entrypoint (Home.py routing) - Home.py — Thin wrapper with minimal logic - database.py — MotionDatabase: DuckDB schema, insert/query/update, party-match calculations - api_client.py — TweedeKamerAPI: fetch OData voting records and group into motions - summarizer.py — MotionSummarizer: LLM integration to generate layman_explanation - config.py — Config dataclass: central configuration (DATABASE_PATH, API/AI settings, constants) - ai_provider.py — Lightweight HTTP wrapper around OpenRouter/OpenAI-style backends - explorer.py — Explorer page logic, tab routing, SVD visualization - explorer_helpers.py — Pure functions for chart builders, coordinate computation - data/ — data/motions.db (DuckDB file, ~18GB) - pyproject.toml — project metadata / dependencies - .env — environment variables (not printed here) ## Directory structure - `pages/` — Streamlit pages: 1_Stemwijzer.py, 2_Explorer.py - `pipeline/` — Data ingestion pipelines: run_pipeline.py, svd_pipeline.py, text_pipeline.py - `analysis/` — SVD, clustering, trajectory, visualization modules - `similarity/` — Embedding-based similarity computation - `scripts/` — Utility scripts for data processing - `tests/` — Test suite using pytest - `migrations/` — SQL migration files ## Core components - Streamlit UI (app.py + pages/) - Presents the voting UI, reads filtered motions from database, creates sessions, writes user votes - Explorer page (explorer.py) provides SVD visualization and party trajectory analysis - Storage (database.py) - MotionDatabase encapsulates DuckDB schema creation and CRUD for motions and user sessions - Exposes a module-level instance `db = MotionDatabase()` used across the codebase - Key responsibilities: insert_motion, get_filtered_motions, create_session, update_user_vote, calculate_party_matches - Ingestion (api_client.py + pipeline/) - api_client.py fetches votes via Tweede Kamer OData API and groups records into motions - pipeline/ orchestrates the full ingestion and analysis workflow - Summarization (summarizer.py) - Wraps an OpenRouter/OpenAI-compatible client (QWEN via OpenRouter recommended) to produce short layman explanations and persists them to DB - Reads motions without layman_explanation and updates rows - Analysis (analysis/) - SVD decomposition of voting patterns - UMAP for visualization - Clustering for motion grouping - Trajectory computation for party movement over time ## Data flow (high level) 1. Ingestion - Pipeline triggers TweedeKamerAPI.get_motions(...) - Each produced motion dict is passed to MotionDatabase.insert_motion() - insert_motion writes to DuckDB (data/motions.db) 2. Enrichment - summarizer.update_motion_summaries() reads motions lacking layman_explanation, calls the LLM client and writes summary text back to the DB 3. Analysis - pipeline/svd_pipeline.py computes SVD embeddings from vote matrix - Results stored in svd_vectors table for visualization 4. Presentation / Interaction - app.py (Streamlit) queries motions via db.get_filtered_motions() and displays them - Users vote; app.py writes votes into the database via db.update_user_vote() - app.py calls db.calculate_party_matches() to compute match percentages for parties ## External integrations & dependencies - Tweede Kamer OData API (api_client.py) - HTTP (requests) - DuckDB (database file at data/motions.db) - Streamlit for UI - OpenRouter/OpenAI-compatible LLM client (ai_provider.py) — configured with environment variables in config.py ## Configuration - config.py: central Config dataclass. Observed keys / env variables referenced across the codebase include: - config.DATABASE_PATH (default "data/motions.db") - OPENROUTER_API_KEY / other OPENROUTER_* variables used by ai_provider.py - QWEN_MODEL (or other model identifier) referenced in summarizer.py - API timeout / batch size constants - .env file present at repo root (do not commit secrets) - Packaging metadata: pyproject.toml ## Build, run & development notes - Install dependencies via the project's Python packaging (pyproject.toml) - Use `uv add` and `uv run` to manage the dependencies in this directory and run scripts - Streamlit app: run `uv run streamlit run app.py` from project root to start the UI (app.py is the intended web entrypoint) - Never use pip directly! - Run tests: `uv run pytest tests/` ## Tests - Test suite in `tests/` using pytest - Run with `uv run pytest tests/` ## Notes / caveats - Project is synchronous (no async/await patterns detected) - Many modules rely on module-level singletons (e.g., `db = MotionDatabase()`, `summarizer = MotionSummarizer()`) - Error handling frequently catches broad Exception and prints to stdout (see database.py, api_client.py) - Logging is not centralized (print statements used) ## Where to look first (for contributors) - app.py + pages/ — follow the UI flow and see how votes & sessions are used - database.py — core data model and calculations - explorer.py — SVD visualization and party analysis - api_client.py — OData ingestion logic - summarizer.py — LLM usage and environment variables - pipeline/ — how ingestion and analysis is orchestrated