You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-04-16-political-compas...

9.1 KiB

date topic status
2026-04-16 political-compass-blog-update draft

Problem Statement

We need the "political compass" blog post under thoughts/ to show figures and numbers that exactly match the repository's canonical pipeline outputs. That requires producing reproducible assets (scree plots, party-agreement CSVs and heatmaps) from the codebase, placing them in docs/research, and making minimal edits to the blog HTML to reference those files.

Key constraint: All numbers and figures must come from the canonical functions or the authoritative DB (data/motions.db). No invented values.

Constraints

Non-negotiables:

  • Use canonical functions (analysis.political_axis.compute_svd_spectrum, analysis.explorer_data.load_scree_data) as data sources.
  • Place generated files under docs/research/ with reproducible, deterministic filenames.
  • Keep blog edits minimal and reversible: swap the markdown table for an HTML table and insert and CSV links.

Operational constraints:

  • Plotly SVG export requires kaleido; provide a reliable matplotlib fallback.
  • data/motions.db must contain required rows (e.g. singular_values) or we must run compute_svd_spectrum first.

Approach (chosen)

I'm choosing a single, pragmatic approach that balances reproducibility, low-risk changes, and minimal new dependencies:

Chosen approach: write a small export script (scripts/export_blog_assets.py) that:

  • Calls analysis.political_axis.compute_svd_spectrum(db_path) for the multi-window scree and analysis.explorer_data.load_scree_data(db_path) for the current_parliament scree fallback.
  • Re-uses the explorer._render_scree_plot logic (or extracts the Plotly-building code into a helper) to build a Plotly Figure and export SVG via fig.write_image(..., format='svg') when kaleido is available.
  • Falls back to matplotlib-based rendering if fig.write_image fails.
  • Computes pairwise party agreement / GL–PvdA trajectory using SQL and the logic from scripts/generate_extra_charts.py, writes CSV with pandas.DataFrame.to_csv(...), and writes a heatmap SVG to docs/research.
  • Writes assets with deterministic filenames into docs/research/ and prints/returns the exact paths and the key numeric values (EVR% for caption).

Why this approach:

  • It uses the canonical functions already present in the codebase so numbers match UI and tests.
  • Keeps edits limited to a single script and the blog HTML, making review and rollback trivial.
  • Provides a clear fallback for environments without kaleido.

Alternatives considered (brief):

  1. Modify existing scripts (scripts/generate_extra_charts.py) to write into docs/research.
  • Pro: reuses plotting code directly.
  • Con: those scripts are opinionated about output layout and write HTML, not SVG/CSV; harder to keep minimal change.
  1. Recompute everything via pipeline.run_pipeline and copy pipeline outputs to docs/research.
  • Pro: purely canonical pipeline outputs.
  • Con: heavier — pipeline run may be slow and more intrusive; more environment setup.

I rejected them because the export-script approach is lighter, reproducible, and gives explicit control over filenames and fallbacks.

Architecture

High-level: a small command-line script (scripts/export_blog_assets.py) driven by the canonical DB, the analysis layer, and the visualize helpers.

Major pieces:

  • Exporter script: orchestrates reads from DB, computes metrics, builds figures, writes CSV/SVG into docs/research.
  • Canonical analysis functions: analysis.political_axis and analysis.explorer_data (data source only, no side effects).
  • Plot builders: reuse of explorer._render_scree_plot / analysis.visualize helpers to produce Plotly Figure objects.
  • Fallback renderer: minimal matplotlib routines producing PNG/SVG if Plotly image export fails.
  • Blog edit: minimal HTML changes in thoughts/blog-post-political-compass.html to reference the generated assets.

Components and Responsibilities

scripts/export_blog_assets.py (new)

  • Inputs: path to DB (default data/motions.db), optional --window (e.g. 2023Q3 or 'current_parliament'), output directory (default docs/research).
  • Responsibilities:
    • Run compute_svd_spectrum(db_path) and/or load_scree_data(db_path).
    • Build scree Plotly figures and export SVGs (multi-window and current_parliament).
    • Compute party agreement matrices, export CSVs and heatmap SVGs for requested window(s).
    • Print the EVR numbers and paths for copy into blog captions.
    • Exit non-zero on fatal errors (missing DB, empty results) with clear messages.

Explorer / analysis helpers

  • analysis.political_axis.compute_svd_spectrum(db_path): canonical EVR source for multi-window scree.
  • analysis.explorer_data.load_scree_data(db_path): canonical loader for current_parliament scree (fallback).
  • explorer._render_scree_plot(importances): returns Plotly figure in Streamlit — reuse the building logic to return a Figure for export.

Fallback renderer

  • Minimal matplotlib code that takes the EVR vector and draws a bar/scree-like chart and saves as SVG/PNG.

Blog file edits

  • thoughts/blog-post-political-compass.html: replace markdown pipe table with an HTML table and insert and plus CSV links.

Data Flow

  1. Exporter reads data from data/motions.db.
  2. Calls compute_svd_spectrum(db_path) to get multi-window EVR arrays.
  3. Calls load_scree_data(db_path) to get 'current_parliament' singular values if available.
  4. Builds Plotly Figures for scree plots (multi-window and current_parliament).
  5. Exports Figures to docs/research/*.svg (uses fig.write_image when kaleido is present, otherwise matplotlib fallback).
  6. Computes party agreement matrices via the SQL used in scripts/generate_extra_charts.py, writes CSVs to docs/research/.
  7. Writes a party-heatmap SVG to docs/research/.
  8. The blog HTML references those files via relative paths (../docs/research/...).

Error Handling Strategy

Fail early with informative messages.

  • If DB is missing or unreadable: exit with a clear error and suggestion to run the pipeline or point --db to a valid file.
  • If compute_svd_spectrum returns empty / no windows: print guidance to run scripts/recompute_svd.py or pipeline.run_pipeline and exit non-zero.
  • If Plotly image export fails (kaleido missing): log the error, attempt matplotlib fallback, and continue.
  • If CSV or SVG write fails due to IO permissions: log path and permission error and exit non-zero (don't silently drop assets).

All non-fatal warnings are printed with suggested remediation steps.

Testing Strategy

Local verification steps (automated script + manual checks):

  • Unit smoke: run scripts/export_blog_assets.py --db data/motions.db --dry-run to verify the functions produce non-empty arrays and print expected output paths.
  • Functional: run the script to produce assets and assert files exist: docs/research/scree_multiwindow.svg, docs/research/scree_current_parliament.svg, docs/research/party_agreement_.csv, docs/research/party_agreement_.svg.
  • Sanity numbers: script prints the top EVR values used in captions. Cross-check printed EVR against explorer UI numbers (run explorer locally if needed).
  • Blog preview: open thoughts/blog-post-political-compass.html in browser (file://) and confirm images render and captions match printed numbers.

Add a basic test under tests/ that runs the exporter against a small fixture DB (or a tmp DB produced from tests/test_political_compass.py fixtures) to assert the script creates at least the CSV and a PNG/SVG.

Effort Estimate & Schedule

  • Draft exporter script and fallback renderer: 2–3 hours.
  • Wire up SQL for party agreement and CSV export: 1 hour.
  • Run and verify assets locally (including possible compute_svd if DB missing): 30–60 minutes.
  • Blog HTML edits and quick preview: 30 minutes.
  • Add a minimal test + docs: 1 hour.

Total: ~5–6 hours of focused work (assuming data/motions.db is present and reasonably up-to-date). If compute_svd must be run across many windows or pipeline.run_pipeline is required, add 30–90 minutes.

Risks & Mitigations

  • Missing singular_values row for current_parliament. Mitigation: script detects and runs compute_svd_spectrum or instructs operator to run scripts/recompute_svd.py.
  • Kaleido not installed causing fig.write_image to fail. Mitigation: implement matplotlib fallback and print clear message recommending pip install kaleido.
  • DB schema drift or missing party ids. Mitigation: script validates expected tables/columns and fails with actionable message.
  • Assets not committed to git. Mitigation: recommend the maintainer commit the generated files; optionally script can print a git add/commit suggestion but must not auto-commit without user request.

Open Questions

  • Which specific window id(s) do we want for the GL–PvdA CSV/heatmap? (I'll default to 'current_parliament' and allow an explicit --window flag.)
  • Should the script auto-commit generated assets to git, or should it stop and ask human to commit? (I recommend manual commit.)

I'm proceeding to create the design doc. Interrupt if you want changes.