parent
3a240fd907
commit
be4375b303
@ -0,0 +1,92 @@ |
||||
--- |
||||
title: Always Derive Blog Numbers from Pipeline Outputs, Not Memory |
||||
date: 2026-04-16 |
||||
category: docs/solutions/best-practices |
||||
module: documentation |
||||
problem_type: best_practice |
||||
component: documentation |
||||
severity: medium |
||||
applies_when: |
||||
- Writing or updating a data-driven blog post |
||||
- Adding EVR percentages, vote counts, or any quantitative claims |
||||
- Referencing pipeline components (embeddings, fusion, similarity) in public-facing docs |
||||
tags: [blog, pipeline, evr, svd, canonical-outputs, data-driven-docs] |
||||
--- |
||||
|
||||
# Always Derive Blog Numbers from Pipeline Outputs, Not Memory |
||||
|
||||
## Context |
||||
|
||||
The political compass blog post was written with hardcoded numbers (EVR ~32%/~21%, 38 windows) that drifted from the pipeline's actual outputs as the data and methodology evolved. A maintenance session was required to bring every figure back in sync, generate supporting visuals, and strip references to pipeline components not yet deployed to production. |
||||
|
||||
## Guidance |
||||
|
||||
**Pull every quantitative claim directly from the canonical pipeline functions:** |
||||
|
||||
| Claim | Canonical source | |
||||
|-------|-----------------| |
||||
| EVR percentages | `analysis.political_axis.compute_svd_spectrum(window_ids=[...])` | |
||||
| Vote/motion counts | `SELECT COUNT(*) FROM motions / mp_votes` via `data/motions.db` | |
||||
| Window count | `analysis.political_axis` — count of aligned windows | |
||||
| Party agreement | `analysis.explorer_data` or direct SQL on `mp_votes` | |
||||
|
||||
**Never reference pipeline components that are not in production.** If `fused_embeddings` rows exist in the DB but the fusion pipeline is not yet in active use, do not describe it as part of the current workflow in blog copy. |
||||
|
||||
**Generate supporting visuals programmatically** (matplotlib → `docs/research/`) and embed them by relative path in the blog HTML. This makes regeneration trivial when numbers change. |
||||
|
||||
## Why This Matters |
||||
|
||||
Hardcoded numbers in blog copy inevitably drift from reality as: |
||||
- More parliamentary windows are added (38 → 41 → …) |
||||
- SVD methodology changes (e.g., Procrustes alignment, window selection) |
||||
- Pipeline components are added or removed from production |
||||
|
||||
When numbers drift, the post loses credibility and requires an expensive archaeology pass to fix. Generating them from the pipeline makes each update a single script run. |
||||
|
||||
## When to Apply |
||||
|
||||
- Before publishing or updating any post that cites quantitative pipeline outputs |
||||
- When the pipeline has changed (new windows, new methodology) and existing posts reference old numbers |
||||
- When removing or adding a pipeline stage — audit all docs for references to that stage |
||||
|
||||
## Examples |
||||
|
||||
**Before (hardcoded, stale):** |
||||
```html |
||||
<p>PC1 explains ~32% of the variance and PC2 explains ~21% — together ~52%.</p> |
||||
``` |
||||
|
||||
**After (derived from pipeline, accurate):** |
||||
```python |
||||
# scripts/generate_blog_assets.py |
||||
from analysis.political_axis import compute_svd_spectrum |
||||
|
||||
evr = compute_svd_spectrum(window_ids=["current_parliament"]) |
||||
# evr[0] = 0.290, evr[1] = 0.1146 → PC1~29%, PC2~11.5%, total~41% |
||||
``` |
||||
```html |
||||
<p>PC1 explains ~29% of the variance and PC2 explains ~11.5% — together ~41%.</p> |
||||
``` |
||||
|
||||
**Multi-window EVR (Procrustes-aligned across all 41 windows):** |
||||
```python |
||||
evr_multi = compute_svd_spectrum() # no window_ids → all windows |
||||
# evr_multi[0] = 0.1463, evr_multi[1] = 0.1310 |
||||
``` |
||||
|
||||
**Party agreement for a specific window:** |
||||
```python |
||||
import duckdb |
||||
con = duckdb.connect("data/motions.db") |
||||
# Agreement between two parties in a quarter |
||||
sql = """ |
||||
SELECT AVG(CASE WHEN a.vote = b.vote THEN 1.0 ELSE 0.0 END) |
||||
FROM mp_votes a JOIN mp_votes b USING (motion_id) |
||||
WHERE a.party = 'GroenLinks' AND b.party = 'PvdA' |
||||
AND a.motion_id IN (SELECT id FROM motions WHERE window_id = '2023-Q3') |
||||
""" |
||||
``` |
||||
|
||||
## Related |
||||
|
||||
- `docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.md` — companion guidance on keeping SVD axis *labels* aligned with voting data rather than semantic assumptions |
||||
Loading…
Reference in new issue