You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
165 lines
8.3 KiB
165 lines
8.3 KiB
---
|
|
date: 2026-03-22
|
|
topic: "Dynamic motion explorer + analysis refresh"
|
|
status: validated
|
|
---
|
|
|
|
## Problem Statement
|
|
|
|
The parliamentary embedding pipeline now covers 2019–2026 with ~25,000 motions, quarterly SVD windows, fused embeddings, and a 200k+ similarity cache. None of this is visible to anyone in an interactive form. The only outputs today are static HTML files written by `generate_compass.py` (if it's been run), and a blog post with placeholder numbers.
|
|
|
|
We need to:
|
|
1. Regenerate all analyses and output graphs with the full dataset
|
|
2. Build an interactive Streamlit explorer that surfaces the political compass, party trajectories, and motion similarity search
|
|
3. Update the blog post with real numbers and findings
|
|
|
|
## Constraints
|
|
|
|
- Do NOT modify `app.py` or `scheduler.py` — these are the production quiz app
|
|
- All DB access in the explorer must be **read-only** (no writes) — pipeline may be running
|
|
- Explorer must work with existing `analysis.*` modules; no new analysis logic
|
|
- Use `@st.cache_data` aggressively — `compute_2d_axes` runs PCA across all windows and is expensive (seconds, not milliseconds)
|
|
- No new external dependencies beyond what's already installed (streamlit, plotly, umap-learn, scikit-learn are all present)
|
|
- Follow existing code style: functional Python, `logging.getLogger(__name__)`, no print statements in library code
|
|
|
|
## Approach
|
|
|
|
**Single-file `explorer.py`** at the project root alongside `app.py`.
|
|
|
|
Four Streamlit tabs:
|
|
1. **Politiek Kompas** — 2D MP/party scatter with a window slider
|
|
2. **Partij Trajectories** — Line traces of party positions over time on the compass
|
|
3. **Motie Zoeken** — Free-text + filter search, returns ranked similar motions
|
|
4. **Motie Browser** — Filterable table of all motions, click to expand detail + similar motions
|
|
|
|
Run with: `streamlit run explorer.py`
|
|
|
|
This approach is chosen because:
|
|
- Reuses all existing `analysis.*` modules without changes
|
|
- Single file means no new package structure to maintain
|
|
- Streamlit tabs map naturally to the four distinct views a researcher would want
|
|
- Read-only DB access means it can run concurrently with the pipeline
|
|
|
|
## Architecture
|
|
|
|
```
|
|
explorer.py
|
|
├── Tab 1: Politiek Kompas
|
|
│ └── analysis.political_axis.compute_2d_axes (cached)
|
|
│ └── analysis.visualize.plot_political_compass → Plotly figure
|
|
│
|
|
├── Tab 2: Partij Trajectories
|
|
│ └── analysis.trajectory.compute_2d_trajectories (cached)
|
|
│ └── analysis.visualize.plot_2d_trajectories → Plotly figure
|
|
│
|
|
├── Tab 3: Motie Zoeken
|
|
│ └── database.get_all_motions (cached, read-only)
|
|
│ └── database.search_similar (similarity_cache lookup)
|
|
│ └── Custom search: filter title/description + show voting_results
|
|
│
|
|
└── Tab 4: Motie Browser
|
|
└── database.get_filtered_motions (cached, read-only)
|
|
└── On click: database.search_similar for related motions
|
|
```
|
|
|
|
## Key Components & Responsibilities
|
|
|
|
**`explorer.py`**
|
|
- Page config: `st.set_page_config(layout="wide", page_title="Parlement Explorer")`
|
|
- Sidebar: DB path input (default `data/motions.db`), window-size toggle (annual/quarterly)
|
|
- `@st.cache_data` wrappers for all expensive DB reads and computations
|
|
- Four tabs via `st.tabs([...])`
|
|
|
|
**Tab 1 — Politiek Kompas**
|
|
- Calls `compute_2d_axes(db_path, method='pca', pca_residual=True)` — cached
|
|
- Window selector slider showing available windows
|
|
- Renders the Plotly scatter for the selected window using `_render_compass_for_window(positions_by_window, window_id, party_map, axis_def)` — a thin Plotly figure builder (not writing to file)
|
|
- Hover: MP name, party, (x, y) coordinates
|
|
- Color by party using `_load_party_map(db_path)` — cached
|
|
|
|
**Tab 2 — Partij Trajectories**
|
|
- Same `positions_by_window` data from Tab 1 (shared cache hit)
|
|
- Multi-select party filter (default: all major parties)
|
|
- Plotly figure: one trace per party, x/y positions connected by lines, labeled by window_id
|
|
- Toggle between showing MPs or just party centroids (computed as mean of MP positions per party per window)
|
|
|
|
**Tab 3 — Motie Zoeken**
|
|
- Search input (Dutch text, free-form)
|
|
- Filters: year range (slider), policy area (multi-select), controversy score (slider)
|
|
- On search: filter `motions` table in-memory against title + layman_explanation text (case-insensitive substring; no embedding search needed at this level)
|
|
- Results list: each result shows title, date, policy area, controversy, layman_explanation
|
|
- Expandable section per result: full description/body_text + "Vergelijkbare moties" from `similarity_cache`
|
|
- Voting breakdown: parse `voting_results` JSON to show Voor/Tegen/Onthouden per party
|
|
|
|
**Tab 4 — Motie Browser**
|
|
- `st.dataframe` with all motions (title, date, policy_area, controversy_score, winning_margin)
|
|
- Column filters at top: year, policy area
|
|
- Sort by: date DESC, controversy DESC, winning_margin ASC (most contested first)
|
|
- Click row → `st.session_state` stores selected motion_id → detail panel below table
|
|
- Detail panel: full motion text + top-10 similar motions from similarity_cache
|
|
|
|
## Data Flow
|
|
|
|
1. On startup: `compute_2d_axes` runs PCA, results cached in Streamlit's in-memory cache
|
|
2. Tab 1/2: pure reads from `svd_vectors` + `mp_metadata` — all cached after first load
|
|
3. Tab 3: on each search, filter pre-loaded motions DataFrame in-memory (no DB query per keypress)
|
|
4. Tab 4: full motions table loaded once and cached; similarity lookups hit `similarity_cache` table via existing `database.get_cached_similarities`
|
|
|
|
All DuckDB connections are opened with `read_only=True` to allow concurrent pipeline access.
|
|
|
|
## Error Handling
|
|
|
|
- If `compute_2d_axes` fails (insufficient data for a window), skip that window and log warning — don't crash the app
|
|
- If `similarity_cache` has no entries for a motion (e.g., new motion not yet processed), show "Nog geen vergelijkbare moties beschikbaar" placeholder
|
|
- If DB file doesn't exist at startup, show an error banner with the path and instructions
|
|
- All `duckdb.connect` calls wrapped in try/finally to guarantee close
|
|
|
|
## Analysis Refresh Plan
|
|
|
|
Before building the explorer, regenerate all outputs:
|
|
|
|
```bash
|
|
# 1. Generate political compass HTML for latest window (annual)
|
|
.venv/bin/python scripts/generate_compass.py \
|
|
--db data/motions.db --out outputs \
|
|
--method pca --pca-residual
|
|
|
|
# 2. Generate similarity cache for new windows (2019–2021, 2024 quarters)
|
|
# (run_pipeline with --skip-metadata --skip-extract --skip-svd --skip-text)
|
|
.venv/bin/python -m pipeline.run_pipeline \
|
|
--db-path data/motions.db \
|
|
--start-date 2019-01-01 --end-date 2025-01-01 \
|
|
--window-size quarterly \
|
|
--skip-metadata --skip-extract --skip-svd --skip-text
|
|
|
|
# 3. Recompute similarity cache for all windows
|
|
.venv/bin/python -c "
|
|
from similarity.compute import recompute_all_windows
|
|
recompute_all_windows('data/motions.db', window_size='quarterly', top_k=20)
|
|
"
|
|
```
|
|
|
|
## Blog Post Updates
|
|
|
|
Target: `thoughts/blog-post-political-compass.md`
|
|
|
|
- Replace placeholder motion counts table with real numbers from DB query
|
|
- Add actual findings from quarterly analysis (not visible in annual windows):
|
|
- 2020-Q2 COVID vote clustering — parties converge on emergency measures
|
|
- 2022-Q4 nitrogen crisis — sharpest left-right split in dataset
|
|
- 2023-Q1 → 2024-Q1 gap (data missing for Q2-Q4 2023)
|
|
- Add "Explorer" section describing `explorer.py` and how to run it
|
|
- Update similarity cache row count (was 212k, now higher with new windows)
|
|
- Fix the "fused = [10] + [2560] = 2570" claim — verify actual dimensions
|
|
|
|
## Testing Strategy
|
|
|
|
- Explorer has no tests (it's a UI script) — verify manually by running `streamlit run explorer.py` after pipeline completes
|
|
- Existing 34 tests stay green — no changes to library modules
|
|
- Run tests after completing implementation: `.venv/bin/python -m pytest -q`
|
|
|
|
## Open Questions
|
|
|
|
- Should the explorer ship as a separate port from `app.py`? (Recommendation: yes, `app.py` stays on its port, `explorer.py` runs on a different port for internal/research use)
|
|
- Should `Verworpen.` motions be filtered from search results by default? (Recommendation: yes, add a "Toon verworpen" toggle defaulting to off)
|
|
- Annual or quarterly windows as the default for the compass? (Recommendation: annual — less noise, cleaner trajectories; quarterly available via sidebar toggle)
|
|
|