8.3 KiB
| date | topic | status |
|---|---|---|
| 2026-03-22 | Dynamic motion explorer + analysis refresh | validated |
Problem Statement
The parliamentary embedding pipeline now covers 2019–2026 with ~25,000 motions, quarterly SVD windows, fused embeddings, and a 200k+ similarity cache. None of this is visible to anyone in an interactive form. The only outputs today are static HTML files written by generate_compass.py (if it's been run), and a blog post with placeholder numbers.
We need to:
- Regenerate all analyses and output graphs with the full dataset
- Build an interactive Streamlit explorer that surfaces the political compass, party trajectories, and motion similarity search
- Update the blog post with real numbers and findings
Constraints
- Do NOT modify
app.pyorscheduler.py— these are the production quiz app - All DB access in the explorer must be read-only (no writes) — pipeline may be running
- Explorer must work with existing
analysis.*modules; no new analysis logic - Use
@st.cache_dataaggressively —compute_2d_axesruns PCA across all windows and is expensive (seconds, not milliseconds) - No new external dependencies beyond what's already installed (streamlit, plotly, umap-learn, scikit-learn are all present)
- Follow existing code style: functional Python,
logging.getLogger(__name__), no print statements in library code
Approach
Single-file explorer.py at the project root alongside app.py.
Four Streamlit tabs:
- Politiek Kompas — 2D MP/party scatter with a window slider
- Partij Trajectories — Line traces of party positions over time on the compass
- Motie Zoeken — Free-text + filter search, returns ranked similar motions
- Motie Browser — Filterable table of all motions, click to expand detail + similar motions
Run with: streamlit run explorer.py
This approach is chosen because:
- Reuses all existing
analysis.*modules without changes - Single file means no new package structure to maintain
- Streamlit tabs map naturally to the four distinct views a researcher would want
- Read-only DB access means it can run concurrently with the pipeline
Architecture
explorer.py
├── Tab 1: Politiek Kompas
│ └── analysis.political_axis.compute_2d_axes (cached)
│ └── analysis.visualize.plot_political_compass → Plotly figure
│
├── Tab 2: Partij Trajectories
│ └── analysis.trajectory.compute_2d_trajectories (cached)
│ └── analysis.visualize.plot_2d_trajectories → Plotly figure
│
├── Tab 3: Motie Zoeken
│ └── database.get_all_motions (cached, read-only)
│ └── database.search_similar (similarity_cache lookup)
│ └── Custom search: filter title/description + show voting_results
│
└── Tab 4: Motie Browser
└── database.get_filtered_motions (cached, read-only)
└── On click: database.search_similar for related motions
Key Components & Responsibilities
explorer.py
- Page config:
st.set_page_config(layout="wide", page_title="Parlement Explorer") - Sidebar: DB path input (default
data/motions.db), window-size toggle (annual/quarterly) @st.cache_datawrappers for all expensive DB reads and computations- Four tabs via
st.tabs([...])
Tab 1 — Politiek Kompas
- Calls
compute_2d_axes(db_path, method='pca', pca_residual=True)— cached - Window selector slider showing available windows
- Renders the Plotly scatter for the selected window using
_render_compass_for_window(positions_by_window, window_id, party_map, axis_def)— a thin Plotly figure builder (not writing to file) - Hover: MP name, party, (x, y) coordinates
- Color by party using
_load_party_map(db_path)— cached
Tab 2 — Partij Trajectories
- Same
positions_by_windowdata from Tab 1 (shared cache hit) - Multi-select party filter (default: all major parties)
- Plotly figure: one trace per party, x/y positions connected by lines, labeled by window_id
- Toggle between showing MPs or just party centroids (computed as mean of MP positions per party per window)
Tab 3 — Motie Zoeken
- Search input (Dutch text, free-form)
- Filters: year range (slider), policy area (multi-select), controversy score (slider)
- On search: filter
motionstable in-memory against title + layman_explanation text (case-insensitive substring; no embedding search needed at this level) - Results list: each result shows title, date, policy area, controversy, layman_explanation
- Expandable section per result: full description/body_text + "Vergelijkbare moties" from
similarity_cache - Voting breakdown: parse
voting_resultsJSON to show Voor/Tegen/Onthouden per party
Tab 4 — Motie Browser
st.dataframewith all motions (title, date, policy_area, controversy_score, winning_margin)- Column filters at top: year, policy area
- Sort by: date DESC, controversy DESC, winning_margin ASC (most contested first)
- Click row →
st.session_statestores selected motion_id → detail panel below table - Detail panel: full motion text + top-10 similar motions from similarity_cache
Data Flow
- On startup:
compute_2d_axesruns PCA, results cached in Streamlit's in-memory cache - Tab 1/2: pure reads from
svd_vectors+mp_metadata— all cached after first load - Tab 3: on each search, filter pre-loaded motions DataFrame in-memory (no DB query per keypress)
- Tab 4: full motions table loaded once and cached; similarity lookups hit
similarity_cachetable via existingdatabase.get_cached_similarities
All DuckDB connections are opened with read_only=True to allow concurrent pipeline access.
Error Handling
- If
compute_2d_axesfails (insufficient data for a window), skip that window and log warning — don't crash the app - If
similarity_cachehas no entries for a motion (e.g., new motion not yet processed), show "Nog geen vergelijkbare moties beschikbaar" placeholder - If DB file doesn't exist at startup, show an error banner with the path and instructions
- All
duckdb.connectcalls wrapped in try/finally to guarantee close
Analysis Refresh Plan
Before building the explorer, regenerate all outputs:
# 1. Generate political compass HTML for latest window (annual)
.venv/bin/python scripts/generate_compass.py \
--db data/motions.db --out outputs \
--method pca --pca-residual
# 2. Generate similarity cache for new windows (2019–2021, 2024 quarters)
# (run_pipeline with --skip-metadata --skip-extract --skip-svd --skip-text)
.venv/bin/python -m pipeline.run_pipeline \
--db-path data/motions.db \
--start-date 2019-01-01 --end-date 2025-01-01 \
--window-size quarterly \
--skip-metadata --skip-extract --skip-svd --skip-text
# 3. Recompute similarity cache for all windows
.venv/bin/python -c "
from similarity.compute import recompute_all_windows
recompute_all_windows('data/motions.db', window_size='quarterly', top_k=20)
"
Blog Post Updates
Target: thoughts/blog-post-political-compass.md
- Replace placeholder motion counts table with real numbers from DB query
- Add actual findings from quarterly analysis (not visible in annual windows):
- 2020-Q2 COVID vote clustering — parties converge on emergency measures
- 2022-Q4 nitrogen crisis — sharpest left-right split in dataset
- 2023-Q1 → 2024-Q1 gap (data missing for Q2-Q4 2023)
- Add "Explorer" section describing
explorer.pyand how to run it - Update similarity cache row count (was 212k, now higher with new windows)
- Fix the "fused = [10] + [2560] = 2570" claim — verify actual dimensions
Testing Strategy
- Explorer has no tests (it's a UI script) — verify manually by running
streamlit run explorer.pyafter pipeline completes - Existing 34 tests stay green — no changes to library modules
- Run tests after completing implementation:
.venv/bin/python -m pytest -q
Open Questions
- Should the explorer ship as a separate port from
app.py? (Recommendation: yes,app.pystays on its port,explorer.pyruns on a different port for internal/research use) - Should
Verworpen.motions be filtered from search results by default? (Recommendation: yes, add a "Toon verworpen" toggle defaulting to off) - Annual or quarterly windows as the default for the compass? (Recommendation: annual — less noise, cleaner trajectories; quarterly available via sidebar toggle)