parent
3a240fd907
commit
be4375b303
@ -0,0 +1,92 @@ |
|||||||
|
--- |
||||||
|
title: Always Derive Blog Numbers from Pipeline Outputs, Not Memory |
||||||
|
date: 2026-04-16 |
||||||
|
category: docs/solutions/best-practices |
||||||
|
module: documentation |
||||||
|
problem_type: best_practice |
||||||
|
component: documentation |
||||||
|
severity: medium |
||||||
|
applies_when: |
||||||
|
- Writing or updating a data-driven blog post |
||||||
|
- Adding EVR percentages, vote counts, or any quantitative claims |
||||||
|
- Referencing pipeline components (embeddings, fusion, similarity) in public-facing docs |
||||||
|
tags: [blog, pipeline, evr, svd, canonical-outputs, data-driven-docs] |
||||||
|
--- |
||||||
|
|
||||||
|
# Always Derive Blog Numbers from Pipeline Outputs, Not Memory |
||||||
|
|
||||||
|
## Context |
||||||
|
|
||||||
|
The political compass blog post was written with hardcoded numbers (EVR ~32%/~21%, 38 windows) that drifted from the pipeline's actual outputs as the data and methodology evolved. A maintenance session was required to bring every figure back in sync, generate supporting visuals, and strip references to pipeline components not yet deployed to production. |
||||||
|
|
||||||
|
## Guidance |
||||||
|
|
||||||
|
**Pull every quantitative claim directly from the canonical pipeline functions:** |
||||||
|
|
||||||
|
| Claim | Canonical source | |
||||||
|
|-------|-----------------| |
||||||
|
| EVR percentages | `analysis.political_axis.compute_svd_spectrum(window_ids=[...])` | |
||||||
|
| Vote/motion counts | `SELECT COUNT(*) FROM motions / mp_votes` via `data/motions.db` | |
||||||
|
| Window count | `analysis.political_axis` — count of aligned windows | |
||||||
|
| Party agreement | `analysis.explorer_data` or direct SQL on `mp_votes` | |
||||||
|
|
||||||
|
**Never reference pipeline components that are not in production.** If `fused_embeddings` rows exist in the DB but the fusion pipeline is not yet in active use, do not describe it as part of the current workflow in blog copy. |
||||||
|
|
||||||
|
**Generate supporting visuals programmatically** (matplotlib → `docs/research/`) and embed them by relative path in the blog HTML. This makes regeneration trivial when numbers change. |
||||||
|
|
||||||
|
## Why This Matters |
||||||
|
|
||||||
|
Hardcoded numbers in blog copy inevitably drift from reality as: |
||||||
|
- More parliamentary windows are added (38 → 41 → …) |
||||||
|
- SVD methodology changes (e.g., Procrustes alignment, window selection) |
||||||
|
- Pipeline components are added or removed from production |
||||||
|
|
||||||
|
When numbers drift, the post loses credibility and requires an expensive archaeology pass to fix. Generating them from the pipeline makes each update a single script run. |
||||||
|
|
||||||
|
## When to Apply |
||||||
|
|
||||||
|
- Before publishing or updating any post that cites quantitative pipeline outputs |
||||||
|
- When the pipeline has changed (new windows, new methodology) and existing posts reference old numbers |
||||||
|
- When removing or adding a pipeline stage — audit all docs for references to that stage |
||||||
|
|
||||||
|
## Examples |
||||||
|
|
||||||
|
**Before (hardcoded, stale):** |
||||||
|
```html |
||||||
|
<p>PC1 explains ~32% of the variance and PC2 explains ~21% — together ~52%.</p> |
||||||
|
``` |
||||||
|
|
||||||
|
**After (derived from pipeline, accurate):** |
||||||
|
```python |
||||||
|
# scripts/generate_blog_assets.py |
||||||
|
from analysis.political_axis import compute_svd_spectrum |
||||||
|
|
||||||
|
evr = compute_svd_spectrum(window_ids=["current_parliament"]) |
||||||
|
# evr[0] = 0.290, evr[1] = 0.1146 → PC1~29%, PC2~11.5%, total~41% |
||||||
|
``` |
||||||
|
```html |
||||||
|
<p>PC1 explains ~29% of the variance and PC2 explains ~11.5% — together ~41%.</p> |
||||||
|
``` |
||||||
|
|
||||||
|
**Multi-window EVR (Procrustes-aligned across all 41 windows):** |
||||||
|
```python |
||||||
|
evr_multi = compute_svd_spectrum() # no window_ids → all windows |
||||||
|
# evr_multi[0] = 0.1463, evr_multi[1] = 0.1310 |
||||||
|
``` |
||||||
|
|
||||||
|
**Party agreement for a specific window:** |
||||||
|
```python |
||||||
|
import duckdb |
||||||
|
con = duckdb.connect("data/motions.db") |
||||||
|
# Agreement between two parties in a quarter |
||||||
|
sql = """ |
||||||
|
SELECT AVG(CASE WHEN a.vote = b.vote THEN 1.0 ELSE 0.0 END) |
||||||
|
FROM mp_votes a JOIN mp_votes b USING (motion_id) |
||||||
|
WHERE a.party = 'GroenLinks' AND b.party = 'PvdA' |
||||||
|
AND a.motion_id IN (SELECT id FROM motions WHERE window_id = '2023-Q3') |
||||||
|
""" |
||||||
|
``` |
||||||
|
|
||||||
|
## Related |
||||||
|
|
||||||
|
- `docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.md` — companion guidance on keeping SVD axis *labels* aligned with voting data rather than semantic assumptions |
||||||
Loading…
Reference in new issue