You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-04-16-political-compas...

153 lines
9.1 KiB

---
date: 2026-04-16
topic: "political-compass-blog-update"
status: draft
---
## Problem Statement
We need the "political compass" blog post under thoughts/ to show figures and numbers that exactly match the repository's canonical pipeline outputs. That requires producing reproducible assets (scree plots, party-agreement CSVs and heatmaps) from the codebase, placing them in docs/research, and making minimal edits to the blog HTML to reference those files.
**Key constraint:** All numbers and figures must come from the canonical functions or the authoritative DB (data/motions.db). No invented values.
## Constraints
**Non-negotiables:**
- Use canonical functions (analysis.political_axis.compute_svd_spectrum, analysis.explorer_data.load_scree_data) as data sources.
- Place generated files under **docs/research/** with reproducible, deterministic filenames.
- Keep blog edits minimal and reversible: swap the markdown table for an HTML table and insert <img> and CSV links.
**Operational constraints:**
- Plotly SVG export requires kaleido; provide a reliable matplotlib fallback.
- data/motions.db must contain required rows (e.g. singular_values) or we must run compute_svd_spectrum first.
## Approach (chosen)
I'm choosing a single, pragmatic approach that balances reproducibility, low-risk changes, and minimal new dependencies:
**Chosen approach:** write a small export script (scripts/export_blog_assets.py) that:
- Calls **analysis.political_axis.compute_svd_spectrum(db_path)** for the multi-window scree and **analysis.explorer_data.load_scree_data(db_path)** for the current_parliament scree fallback.
- Re-uses the explorer._render_scree_plot logic (or extracts the Plotly-building code into a helper) to build a Plotly Figure and export SVG via **fig.write_image(..., format='svg')** when kaleido is available.
- Falls back to matplotlib-based rendering if fig.write_image fails.
- Computes pairwise party agreement / GL–PvdA trajectory using SQL and the logic from scripts/generate_extra_charts.py, writes CSV with pandas.DataFrame.to_csv(...), and writes a heatmap SVG to docs/research.
- Writes assets with deterministic filenames into **docs/research/** and prints/returns the exact paths and the key numeric values (EVR% for caption).
Why this approach:
- It uses the canonical functions already present in the codebase so numbers match UI and tests.
- Keeps edits limited to a single script and the blog HTML, making review and rollback trivial.
- Provides a clear fallback for environments without kaleido.
Alternatives considered (brief):
1) Modify existing scripts (scripts/generate_extra_charts.py) to write into docs/research.
- Pro: reuses plotting code directly.
- Con: those scripts are opinionated about output layout and write HTML, not SVG/CSV; harder to keep minimal change.
2) Recompute everything via pipeline.run_pipeline and copy pipeline outputs to docs/research.
- Pro: purely canonical pipeline outputs.
- Con: heavier — pipeline run may be slow and more intrusive; more environment setup.
I rejected them because the export-script approach is lighter, reproducible, and gives explicit control over filenames and fallbacks.
## Architecture
High-level: a small command-line script (scripts/export_blog_assets.py) driven by the canonical DB, the analysis layer, and the visualize helpers.
**Major pieces:**
- **Exporter script**: orchestrates reads from DB, computes metrics, builds figures, writes CSV/SVG into docs/research.
- **Canonical analysis functions**: analysis.political_axis and analysis.explorer_data (data source only, no side effects).
- **Plot builders**: reuse of explorer._render_scree_plot / analysis.visualize helpers to produce Plotly Figure objects.
- **Fallback renderer**: minimal matplotlib routines producing PNG/SVG if Plotly image export fails.
- **Blog edit**: minimal HTML changes in thoughts/blog-post-political-compass.html to reference the generated assets.
## Components and Responsibilities
**scripts/export_blog_assets.py** (new)
- Inputs: path to DB (default data/motions.db), optional --window (e.g. 2023Q3 or 'current_parliament'), output directory (default docs/research).
- Responsibilities:
- Run compute_svd_spectrum(db_path) and/or load_scree_data(db_path).
- Build scree Plotly figures and export SVGs (multi-window and current_parliament).
- Compute party agreement matrices, export CSVs and heatmap SVGs for requested window(s).
- Print the EVR numbers and paths for copy into blog captions.
- Exit non-zero on fatal errors (missing DB, empty results) with clear messages.
**Explorer / analysis helpers**
- analysis.political_axis.compute_svd_spectrum(db_path): canonical EVR source for multi-window scree.
- analysis.explorer_data.load_scree_data(db_path): canonical loader for current_parliament scree (fallback).
- explorer._render_scree_plot(importances): returns Plotly figure in Streamlit — reuse the building logic to return a Figure for export.
**Fallback renderer**
- Minimal matplotlib code that takes the EVR vector and draws a bar/scree-like chart and saves as SVG/PNG.
**Blog file edits**
- thoughts/blog-post-political-compass.html: replace markdown pipe table with an HTML table and insert <img src="../docs/research/scree_multiwindow.svg"> and <img src="../docs/research/scree_current_parliament.svg"> plus CSV links.
## Data Flow
1. Exporter reads data from **data/motions.db**.
2. Calls compute_svd_spectrum(db_path) to get multi-window EVR arrays.
3. Calls load_scree_data(db_path) to get 'current_parliament' singular values if available.
4. Builds Plotly Figures for scree plots (multi-window and current_parliament).
5. Exports Figures to **docs/research/*.svg** (uses fig.write_image when kaleido is present, otherwise matplotlib fallback).
6. Computes party agreement matrices via the SQL used in scripts/generate_extra_charts.py, writes CSVs to **docs/research/**.
7. Writes a party-heatmap SVG to **docs/research/**.
8. The blog HTML references those files via relative paths (../docs/research/...).
## Error Handling Strategy
**Fail early with informative messages.**
- If DB is missing or unreadable: exit with a clear error and suggestion to run the pipeline or point --db to a valid file.
- If compute_svd_spectrum returns empty / no windows: print guidance to run scripts/recompute_svd.py or pipeline.run_pipeline and exit non-zero.
- If Plotly image export fails (kaleido missing): log the error, attempt matplotlib fallback, and continue.
- If CSV or SVG write fails due to IO permissions: log path and permission error and exit non-zero (don't silently drop assets).
All non-fatal warnings are printed with suggested remediation steps.
## Testing Strategy
Local verification steps (automated script + manual checks):
- Unit smoke: run scripts/export_blog_assets.py --db data/motions.db --dry-run to verify the functions produce non-empty arrays and print expected output paths.
- Functional: run the script to produce assets and assert files exist: docs/research/scree_multiwindow.svg, docs/research/scree_current_parliament.svg, docs/research/party_agreement_<window>.csv, docs/research/party_agreement_<window>.svg.
- Sanity numbers: script prints the top EVR values used in captions. Cross-check printed EVR against explorer UI numbers (run explorer locally if needed).
- Blog preview: open thoughts/blog-post-political-compass.html in browser (file://) and confirm images render and captions match printed numbers.
Add a basic test under tests/ that runs the exporter against a small fixture DB (or a tmp DB produced from tests/test_political_compass.py fixtures) to assert the script creates at least the CSV and a PNG/SVG.
## Effort Estimate & Schedule
- Draft exporter script and fallback renderer: 2–3 hours.
- Wire up SQL for party agreement and CSV export: 1 hour.
- Run and verify assets locally (including possible compute_svd if DB missing): 30–60 minutes.
- Blog HTML edits and quick preview: 30 minutes.
- Add a minimal test + docs: 1 hour.
Total: ~5–6 hours of focused work (assuming data/motions.db is present and reasonably up-to-date). If compute_svd must be run across many windows or pipeline.run_pipeline is required, add 30–90 minutes.
## Risks & Mitigations
- **Missing singular_values row for current_parliament.** Mitigation: script detects and runs compute_svd_spectrum or instructs operator to run scripts/recompute_svd.py.
- **Kaleido not installed causing fig.write_image to fail.** Mitigation: implement matplotlib fallback and print clear message recommending pip install kaleido.
- **DB schema drift or missing party ids.** Mitigation: script validates expected tables/columns and fails with actionable message.
- **Assets not committed to git.** Mitigation: recommend the maintainer commit the generated files; optionally script can print a git add/commit suggestion but must not auto-commit without user request.
## Open Questions
- Which specific window id(s) do we want for the GL–PvdA CSV/heatmap? (I'll default to 'current_parliament' and allow an explicit --window flag.)
- Should the script auto-commit generated assets to git, or should it stop and ask human to commit? (I recommend manual commit.)
---
I'm proceeding to create the design doc. Interrupt if you want changes.