motief

9.1 KiB

Raw Blame History

date	topic	status
2026-04-16	political-compass-blog-update	draft

Problem Statement

We need the "political compass" blog post under thoughts/ to show figures and numbers that exactly match the repository's canonical pipeline outputs. That requires producing reproducible assets (scree plots, party-agreement CSVs and heatmaps) from the codebase, placing them in docs/research, and making minimal edits to the blog HTML to reference those files.

Key constraint: All numbers and figures must come from the canonical functions or the authoritative DB (data/motions.db). No invented values.

Constraints

Non-negotiables:

Use canonical functions (analysis.political_axis.compute_svd_spectrum, analysis.explorer_data.load_scree_data) as data sources.
Place generated files under docs/research/ with reproducible, deterministic filenames.
Keep blog edits minimal and reversible: swap the markdown table for an HTML table and insert and CSV links.

Operational constraints:

Plotly SVG export requires kaleido; provide a reliable matplotlib fallback.
data/motions.db must contain required rows (e.g. singular_values) or we must run compute_svd_spectrum first.

Approach (chosen)

I'm choosing a single, pragmatic approach that balances reproducibility, low-risk changes, and minimal new dependencies:

Chosen approach: write a small export script (scripts/export_blog_assets.py) that:

Calls analysis.political_axis.compute_svd_spectrum(db_path) for the multi-window scree and analysis.explorer_data.load_scree_data(db_path) for the current_parliament scree fallback.
Re-uses the explorer._render_scree_plot logic (or extracts the Plotly-building code into a helper) to build a Plotly Figure and export SVG via fig.write_image(..., format='svg') when kaleido is available.
Falls back to matplotlib-based rendering if fig.write_image fails.
Computes pairwise party agreement / GL–PvdA trajectory using SQL and the logic from scripts/generate_extra_charts.py, writes CSV with pandas.DataFrame.to_csv(...), and writes a heatmap SVG to docs/research.
Writes assets with deterministic filenames into docs/research/ and prints/returns the exact paths and the key numeric values (EVR% for caption).

Why this approach:

It uses the canonical functions already present in the codebase so numbers match UI and tests.
Keeps edits limited to a single script and the blog HTML, making review and rollback trivial.
Provides a clear fallback for environments without kaleido.

Alternatives considered (brief):

Modify existing scripts (scripts/generate_extra_charts.py) to write into docs/research.

Pro: reuses plotting code directly.
Con: those scripts are opinionated about output layout and write HTML, not SVG/CSV; harder to keep minimal change.

Recompute everything via pipeline.run_pipeline and copy pipeline outputs to docs/research.

Pro: purely canonical pipeline outputs.
Con: heavier — pipeline run may be slow and more intrusive; more environment setup.

I rejected them because the export-script approach is lighter, reproducible, and gives explicit control over filenames and fallbacks.

Architecture

High-level: a small command-line script (scripts/export_blog_assets.py) driven by the canonical DB, the analysis layer, and the visualize helpers.

Major pieces:

Exporter script: orchestrates reads from DB, computes metrics, builds figures, writes CSV/SVG into docs/research.
Canonical analysis functions: analysis.political_axis and analysis.explorer_data (data source only, no side effects).
Plot builders: reuse of explorer._render_scree_plot / analysis.visualize helpers to produce Plotly Figure objects.
Fallback renderer: minimal matplotlib routines producing PNG/SVG if Plotly image export fails.
Blog edit: minimal HTML changes in thoughts/blog-post-political-compass.html to reference the generated assets.

Components and Responsibilities

scripts/export_blog_assets.py (new)

Inputs: path to DB (default data/motions.db), optional --window (e.g. 2023Q3 or 'current_parliament'), output directory (default docs/research).
Responsibilities:
- Run compute_svd_spectrum(db_path) and/or load_scree_data(db_path).
- Build scree Plotly figures and export SVGs (multi-window and current_parliament).
- Compute party agreement matrices, export CSVs and heatmap SVGs for requested window(s).
- Print the EVR numbers and paths for copy into blog captions.
- Exit non-zero on fatal errors (missing DB, empty results) with clear messages.

Explorer / analysis helpers

analysis.political_axis.compute_svd_spectrum(db_path): canonical EVR source for multi-window scree.
analysis.explorer_data.load_scree_data(db_path): canonical loader for current_parliament scree (fallback).
explorer._render_scree_plot(importances): returns Plotly figure in Streamlit — reuse the building logic to return a Figure for export.

Fallback renderer

Minimal matplotlib code that takes the EVR vector and draws a bar/scree-like chart and saves as SVG/PNG.

Blog file edits

thoughts/blog-post-political-compass.html: replace markdown pipe table with an HTML table and insert and plus CSV links.

Data Flow

Exporter reads data from data/motions.db.
Calls compute_svd_spectrum(db_path) to get multi-window EVR arrays.
Calls load_scree_data(db_path) to get 'current_parliament' singular values if available.
Builds Plotly Figures for scree plots (multi-window and current_parliament).
Exports Figures to docs/research/*.svg (uses fig.write_image when kaleido is present, otherwise matplotlib fallback).
Computes party agreement matrices via the SQL used in scripts/generate_extra_charts.py, writes CSVs to docs/research/.
Writes a party-heatmap SVG to docs/research/.
The blog HTML references those files via relative paths (../docs/research/...).

Error Handling Strategy

Fail early with informative messages.

If DB is missing or unreadable: exit with a clear error and suggestion to run the pipeline or point --db to a valid file.
If compute_svd_spectrum returns empty / no windows: print guidance to run scripts/recompute_svd.py or pipeline.run_pipeline and exit non-zero.
If Plotly image export fails (kaleido missing): log the error, attempt matplotlib fallback, and continue.
If CSV or SVG write fails due to IO permissions: log path and permission error and exit non-zero (don't silently drop assets).

All non-fatal warnings are printed with suggested remediation steps.

Testing Strategy

Local verification steps (automated script + manual checks):

Unit smoke: run scripts/export_blog_assets.py --db data/motions.db --dry-run to verify the functions produce non-empty arrays and print expected output paths.
Functional: run the script to produce assets and assert files exist: docs/research/scree_multiwindow.svg, docs/research/scree_current_parliament.svg, docs/research/party_agreement_.csv, docs/research/party_agreement_.svg.
Sanity numbers: script prints the top EVR values used in captions. Cross-check printed EVR against explorer UI numbers (run explorer locally if needed).
Blog preview: open thoughts/blog-post-political-compass.html in browser (file://) and confirm images render and captions match printed numbers.

Add a basic test under tests/ that runs the exporter against a small fixture DB (or a tmp DB produced from tests/test_political_compass.py fixtures) to assert the script creates at least the CSV and a PNG/SVG.

Effort Estimate & Schedule

Draft exporter script and fallback renderer: 2–3 hours.
Wire up SQL for party agreement and CSV export: 1 hour.
Run and verify assets locally (including possible compute_svd if DB missing): 30–60 minutes.
Blog HTML edits and quick preview: 30 minutes.
Add a minimal test + docs: 1 hour.

Total: ~5–6 hours of focused work (assuming data/motions.db is present and reasonably up-to-date). If compute_svd must be run across many windows or pipeline.run_pipeline is required, add 30–90 minutes.

Risks & Mitigations

Missing singular_values row for current_parliament. Mitigation: script detects and runs compute_svd_spectrum or instructs operator to run scripts/recompute_svd.py.
Kaleido not installed causing fig.write_image to fail. Mitigation: implement matplotlib fallback and print clear message recommending pip install kaleido.
DB schema drift or missing party ids. Mitigation: script validates expected tables/columns and fails with actionable message.
Assets not committed to git. Mitigation: recommend the maintainer commit the generated files; optionally script can print a git add/commit suggestion but must not auto-commit without user request.

Open Questions

Which specific window id(s) do we want for the GL–PvdA CSV/heatmap? (I'll default to 'current_parliament' and allow an explicit --window flag.)
Should the script auto-commit generated assets to git, or should it stop and ask human to commit? (I recommend manual commit.)

I'm proceeding to create the design doc. Interrupt if you want changes.

9.1 KiB Raw Blame History