You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-03-22-motion-explorer-...

8.3 KiB

date topic status
2026-03-22 Dynamic motion explorer + analysis refresh validated

Problem Statement

The parliamentary embedding pipeline now covers 2019–2026 with ~25,000 motions, quarterly SVD windows, fused embeddings, and a 200k+ similarity cache. None of this is visible to anyone in an interactive form. The only outputs today are static HTML files written by generate_compass.py (if it's been run), and a blog post with placeholder numbers.

We need to:

  1. Regenerate all analyses and output graphs with the full dataset
  2. Build an interactive Streamlit explorer that surfaces the political compass, party trajectories, and motion similarity search
  3. Update the blog post with real numbers and findings

Constraints

  • Do NOT modify app.py or scheduler.py — these are the production quiz app
  • All DB access in the explorer must be read-only (no writes) — pipeline may be running
  • Explorer must work with existing analysis.* modules; no new analysis logic
  • Use @st.cache_data aggressively — compute_2d_axes runs PCA across all windows and is expensive (seconds, not milliseconds)
  • No new external dependencies beyond what's already installed (streamlit, plotly, umap-learn, scikit-learn are all present)
  • Follow existing code style: functional Python, logging.getLogger(__name__), no print statements in library code

Approach

Single-file explorer.py at the project root alongside app.py.

Four Streamlit tabs:

  1. Politiek Kompas — 2D MP/party scatter with a window slider
  2. Partij Trajectories — Line traces of party positions over time on the compass
  3. Motie Zoeken — Free-text + filter search, returns ranked similar motions
  4. Motie Browser — Filterable table of all motions, click to expand detail + similar motions

Run with: streamlit run explorer.py

This approach is chosen because:

  • Reuses all existing analysis.* modules without changes
  • Single file means no new package structure to maintain
  • Streamlit tabs map naturally to the four distinct views a researcher would want
  • Read-only DB access means it can run concurrently with the pipeline

Architecture

explorer.py
  ├── Tab 1: Politiek Kompas
  │     └── analysis.political_axis.compute_2d_axes (cached)
  │     └── analysis.visualize.plot_political_compass → Plotly figure
  │
  ├── Tab 2: Partij Trajectories  
  │     └── analysis.trajectory.compute_2d_trajectories (cached)
  │     └── analysis.visualize.plot_2d_trajectories → Plotly figure
  │
  ├── Tab 3: Motie Zoeken
  │     └── database.get_all_motions (cached, read-only)
  │     └── database.search_similar (similarity_cache lookup)
  │     └── Custom search: filter title/description + show voting_results
  │
  └── Tab 4: Motie Browser
        └── database.get_filtered_motions (cached, read-only)
        └── On click: database.search_similar for related motions

Key Components & Responsibilities

explorer.py

  • Page config: st.set_page_config(layout="wide", page_title="Parlement Explorer")
  • Sidebar: DB path input (default data/motions.db), window-size toggle (annual/quarterly)
  • @st.cache_data wrappers for all expensive DB reads and computations
  • Four tabs via st.tabs([...])

Tab 1 — Politiek Kompas

  • Calls compute_2d_axes(db_path, method='pca', pca_residual=True) — cached
  • Window selector slider showing available windows
  • Renders the Plotly scatter for the selected window using _render_compass_for_window(positions_by_window, window_id, party_map, axis_def) — a thin Plotly figure builder (not writing to file)
  • Hover: MP name, party, (x, y) coordinates
  • Color by party using _load_party_map(db_path) — cached

Tab 2 — Partij Trajectories

  • Same positions_by_window data from Tab 1 (shared cache hit)
  • Multi-select party filter (default: all major parties)
  • Plotly figure: one trace per party, x/y positions connected by lines, labeled by window_id
  • Toggle between showing MPs or just party centroids (computed as mean of MP positions per party per window)

Tab 3 — Motie Zoeken

  • Search input (Dutch text, free-form)
  • Filters: year range (slider), policy area (multi-select), controversy score (slider)
  • On search: filter motions table in-memory against title + layman_explanation text (case-insensitive substring; no embedding search needed at this level)
  • Results list: each result shows title, date, policy area, controversy, layman_explanation
  • Expandable section per result: full description/body_text + "Vergelijkbare moties" from similarity_cache
  • Voting breakdown: parse voting_results JSON to show Voor/Tegen/Onthouden per party

Tab 4 — Motie Browser

  • st.dataframe with all motions (title, date, policy_area, controversy_score, winning_margin)
  • Column filters at top: year, policy area
  • Sort by: date DESC, controversy DESC, winning_margin ASC (most contested first)
  • Click row → st.session_state stores selected motion_id → detail panel below table
  • Detail panel: full motion text + top-10 similar motions from similarity_cache

Data Flow

  1. On startup: compute_2d_axes runs PCA, results cached in Streamlit's in-memory cache
  2. Tab 1/2: pure reads from svd_vectors + mp_metadata — all cached after first load
  3. Tab 3: on each search, filter pre-loaded motions DataFrame in-memory (no DB query per keypress)
  4. Tab 4: full motions table loaded once and cached; similarity lookups hit similarity_cache table via existing database.get_cached_similarities

All DuckDB connections are opened with read_only=True to allow concurrent pipeline access.

Error Handling

  • If compute_2d_axes fails (insufficient data for a window), skip that window and log warning — don't crash the app
  • If similarity_cache has no entries for a motion (e.g., new motion not yet processed), show "Nog geen vergelijkbare moties beschikbaar" placeholder
  • If DB file doesn't exist at startup, show an error banner with the path and instructions
  • All duckdb.connect calls wrapped in try/finally to guarantee close

Analysis Refresh Plan

Before building the explorer, regenerate all outputs:

# 1. Generate political compass HTML for latest window (annual)
.venv/bin/python scripts/generate_compass.py \
    --db data/motions.db --out outputs \
    --method pca --pca-residual

# 2. Generate similarity cache for new windows (2019–2021, 2024 quarters)
#    (run_pipeline with --skip-metadata --skip-extract --skip-svd --skip-text)
.venv/bin/python -m pipeline.run_pipeline \
    --db-path data/motions.db \
    --start-date 2019-01-01 --end-date 2025-01-01 \
    --window-size quarterly \
    --skip-metadata --skip-extract --skip-svd --skip-text

# 3. Recompute similarity cache for all windows
.venv/bin/python -c "
from similarity.compute import recompute_all_windows
recompute_all_windows('data/motions.db', window_size='quarterly', top_k=20)
"

Blog Post Updates

Target: thoughts/blog-post-political-compass.md

  • Replace placeholder motion counts table with real numbers from DB query
  • Add actual findings from quarterly analysis (not visible in annual windows):
    • 2020-Q2 COVID vote clustering — parties converge on emergency measures
    • 2022-Q4 nitrogen crisis — sharpest left-right split in dataset
    • 2023-Q1 → 2024-Q1 gap (data missing for Q2-Q4 2023)
  • Add "Explorer" section describing explorer.py and how to run it
  • Update similarity cache row count (was 212k, now higher with new windows)
  • Fix the "fused = [10] + [2560] = 2570" claim — verify actual dimensions

Testing Strategy

  • Explorer has no tests (it's a UI script) — verify manually by running streamlit run explorer.py after pipeline completes
  • Existing 34 tests stay green — no changes to library modules
  • Run tests after completing implementation: .venv/bin/python -m pytest -q

Open Questions

  • Should the explorer ship as a separate port from app.py? (Recommendation: yes, app.py stays on its port, explorer.py runs on a different port for internal/research use)
  • Should Verworpen. motions be filtered from search results by default? (Recommendation: yes, add a "Toon verworpen" toggle defaulting to off)
  • Annual or quarterly windows as the default for the compass? (Recommendation: annual — less noise, cleaner trajectories; quarterly available via sidebar toggle)