motief

8.3 KiB

Raw Blame History

date	topic	status
2026-03-22	Dynamic motion explorer + analysis refresh	validated

Problem Statement

The parliamentary embedding pipeline now covers 2019–2026 with ~25,000 motions, quarterly SVD windows, fused embeddings, and a 200k+ similarity cache. None of this is visible to anyone in an interactive form. The only outputs today are static HTML files written by generate_compass.py (if it's been run), and a blog post with placeholder numbers.

We need to:

Regenerate all analyses and output graphs with the full dataset
Build an interactive Streamlit explorer that surfaces the political compass, party trajectories, and motion similarity search
Update the blog post with real numbers and findings

Constraints

Do NOT modify app.py or scheduler.py — these are the production quiz app
All DB access in the explorer must be read-only (no writes) — pipeline may be running
Explorer must work with existing analysis.* modules; no new analysis logic
Use @st.cache_data aggressively — compute_2d_axes runs PCA across all windows and is expensive (seconds, not milliseconds)
No new external dependencies beyond what's already installed (streamlit, plotly, umap-learn, scikit-learn are all present)
Follow existing code style: functional Python, logging.getLogger(__name__), no print statements in library code

Approach

Single-file explorer.py at the project root alongside app.py.

Four Streamlit tabs:

Politiek Kompas — 2D MP/party scatter with a window slider
Partij Trajectories — Line traces of party positions over time on the compass
Motie Zoeken — Free-text + filter search, returns ranked similar motions
Motie Browser — Filterable table of all motions, click to expand detail + similar motions

Run with: streamlit run explorer.py

This approach is chosen because:

Reuses all existing analysis.* modules without changes
Single file means no new package structure to maintain
Streamlit tabs map naturally to the four distinct views a researcher would want
Read-only DB access means it can run concurrently with the pipeline

Architecture

explorer.py
  ├── Tab 1: Politiek Kompas
  │     └── analysis.political_axis.compute_2d_axes (cached)
  │     └── analysis.visualize.plot_political_compass → Plotly figure
  │
  ├── Tab 2: Partij Trajectories  
  │     └── analysis.trajectory.compute_2d_trajectories (cached)
  │     └── analysis.visualize.plot_2d_trajectories → Plotly figure
  │
  ├── Tab 3: Motie Zoeken
  │     └── database.get_all_motions (cached, read-only)
  │     └── database.search_similar (similarity_cache lookup)
  │     └── Custom search: filter title/description + show voting_results
  │
  └── Tab 4: Motie Browser
        └── database.get_filtered_motions (cached, read-only)
        └── On click: database.search_similar for related motions

Key Components & Responsibilities

explorer.py

Page config: st.set_page_config(layout="wide", page_title="Parlement Explorer")
Sidebar: DB path input (default data/motions.db), window-size toggle (annual/quarterly)
@st.cache_data wrappers for all expensive DB reads and computations
Four tabs via st.tabs([...])

Tab 1 — Politiek Kompas

Calls compute_2d_axes(db_path, method='pca', pca_residual=True) — cached
Window selector slider showing available windows
Renders the Plotly scatter for the selected window using _render_compass_for_window(positions_by_window, window_id, party_map, axis_def) — a thin Plotly figure builder (not writing to file)
Hover: MP name, party, (x, y) coordinates
Color by party using _load_party_map(db_path) — cached

Tab 2 — Partij Trajectories

Same positions_by_window data from Tab 1 (shared cache hit)
Multi-select party filter (default: all major parties)
Plotly figure: one trace per party, x/y positions connected by lines, labeled by window_id
Toggle between showing MPs or just party centroids (computed as mean of MP positions per party per window)

Tab 3 — Motie Zoeken

Search input (Dutch text, free-form)
Filters: year range (slider), policy area (multi-select), controversy score (slider)
On search: filter motions table in-memory against title + layman_explanation text (case-insensitive substring; no embedding search needed at this level)
Results list: each result shows title, date, policy area, controversy, layman_explanation
Expandable section per result: full description/body_text + "Vergelijkbare moties" from similarity_cache
Voting breakdown: parse voting_results JSON to show Voor/Tegen/Onthouden per party

Tab 4 — Motie Browser

st.dataframe with all motions (title, date, policy_area, controversy_score, winning_margin)
Column filters at top: year, policy area
Sort by: date DESC, controversy DESC, winning_margin ASC (most contested first)
Click row → st.session_state stores selected motion_id → detail panel below table
Detail panel: full motion text + top-10 similar motions from similarity_cache

Data Flow

On startup: compute_2d_axes runs PCA, results cached in Streamlit's in-memory cache
Tab 1/2: pure reads from svd_vectors + mp_metadata — all cached after first load
Tab 3: on each search, filter pre-loaded motions DataFrame in-memory (no DB query per keypress)
Tab 4: full motions table loaded once and cached; similarity lookups hit similarity_cache table via existing database.get_cached_similarities

All DuckDB connections are opened with read_only=True to allow concurrent pipeline access.

Error Handling

If compute_2d_axes fails (insufficient data for a window), skip that window and log warning — don't crash the app
If similarity_cache has no entries for a motion (e.g., new motion not yet processed), show "Nog geen vergelijkbare moties beschikbaar" placeholder
If DB file doesn't exist at startup, show an error banner with the path and instructions
All duckdb.connect calls wrapped in try/finally to guarantee close

Analysis Refresh Plan

Before building the explorer, regenerate all outputs:

# 1. Generate political compass HTML for latest window (annual)
.venv/bin/python scripts/generate_compass.py \
    --db data/motions.db --out outputs \
    --method pca --pca-residual

# 2. Generate similarity cache for new windows (2019–2021, 2024 quarters)
#    (run_pipeline with --skip-metadata --skip-extract --skip-svd --skip-text)
.venv/bin/python -m pipeline.run_pipeline \
    --db-path data/motions.db \
    --start-date 2019-01-01 --end-date 2025-01-01 \
    --window-size quarterly \
    --skip-metadata --skip-extract --skip-svd --skip-text

# 3. Recompute similarity cache for all windows
.venv/bin/python -c "
from similarity.compute import recompute_all_windows
recompute_all_windows('data/motions.db', window_size='quarterly', top_k=20)
"

Blog Post Updates

Target: thoughts/blog-post-political-compass.md

Replace placeholder motion counts table with real numbers from DB query
Add actual findings from quarterly analysis (not visible in annual windows):
- 2020-Q2 COVID vote clustering — parties converge on emergency measures
- 2022-Q4 nitrogen crisis — sharpest left-right split in dataset
- 2023-Q1 → 2024-Q1 gap (data missing for Q2-Q4 2023)
Add "Explorer" section describing explorer.py and how to run it
Update similarity cache row count (was 212k, now higher with new windows)
Fix the "fused = [10] + [2560] = 2570" claim — verify actual dimensions

Testing Strategy

Explorer has no tests (it's a UI script) — verify manually by running streamlit run explorer.py after pipeline completes
Existing 34 tests stay green — no changes to library modules
Run tests after completing implementation: .venv/bin/python -m pytest -q

Open Questions

Should the explorer ship as a separate port from app.py? (Recommendation: yes, app.py stays on its port, explorer.py runs on a different port for internal/research use)
Should Verworpen. motions be filtered from search results by default? (Recommendation: yes, add a "Toon verworpen" toggle defaulting to off)
Annual or quarterly windows as the default for the compass? (Recommendation: annual — less noise, cleaner trajectories; quarterly available via sidebar toggle)

8.3 KiB Raw Blame History