You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
92 lines
3.8 KiB
92 lines
3.8 KiB
---
|
|
title: Always Derive Blog Numbers from Pipeline Outputs, Not Memory
|
|
date: 2026-04-16
|
|
category: docs/solutions/best-practices
|
|
module: documentation
|
|
problem_type: best_practice
|
|
component: documentation
|
|
severity: medium
|
|
applies_when:
|
|
- Writing or updating a data-driven blog post
|
|
- Adding EVR percentages, vote counts, or any quantitative claims
|
|
- Referencing pipeline components (embeddings, fusion, similarity) in public-facing docs
|
|
tags: [blog, pipeline, evr, svd, canonical-outputs, data-driven-docs]
|
|
---
|
|
|
|
# Always Derive Blog Numbers from Pipeline Outputs, Not Memory
|
|
|
|
## Context
|
|
|
|
The political compass blog post was written with hardcoded numbers (EVR ~32%/~21%, 38 windows) that drifted from the pipeline's actual outputs as the data and methodology evolved. A maintenance session was required to bring every figure back in sync, generate supporting visuals, and strip references to pipeline components not yet deployed to production.
|
|
|
|
## Guidance
|
|
|
|
**Pull every quantitative claim directly from the canonical pipeline functions:**
|
|
|
|
| Claim | Canonical source |
|
|
|-------|-----------------|
|
|
| EVR percentages | `analysis.political_axis.compute_svd_spectrum(window_ids=[...])` |
|
|
| Vote/motion counts | `SELECT COUNT(*) FROM motions / mp_votes` via `data/motions.db` |
|
|
| Window count | `analysis.political_axis` — count of aligned windows |
|
|
| Party agreement | `analysis.explorer_data` or direct SQL on `mp_votes` |
|
|
|
|
**Never reference pipeline components that are not in production.** If `fused_embeddings` rows exist in the DB but the fusion pipeline is not yet in active use, do not describe it as part of the current workflow in blog copy.
|
|
|
|
**Generate supporting visuals programmatically** (matplotlib → `docs/research/`) and embed them by relative path in the blog HTML. This makes regeneration trivial when numbers change.
|
|
|
|
## Why This Matters
|
|
|
|
Hardcoded numbers in blog copy inevitably drift from reality as:
|
|
- More parliamentary windows are added (38 → 41 → …)
|
|
- SVD methodology changes (e.g., Procrustes alignment, window selection)
|
|
- Pipeline components are added or removed from production
|
|
|
|
When numbers drift, the post loses credibility and requires an expensive archaeology pass to fix. Generating them from the pipeline makes each update a single script run.
|
|
|
|
## When to Apply
|
|
|
|
- Before publishing or updating any post that cites quantitative pipeline outputs
|
|
- When the pipeline has changed (new windows, new methodology) and existing posts reference old numbers
|
|
- When removing or adding a pipeline stage — audit all docs for references to that stage
|
|
|
|
## Examples
|
|
|
|
**Before (hardcoded, stale):**
|
|
```html
|
|
<p>PC1 explains ~32% of the variance and PC2 explains ~21% — together ~52%.</p>
|
|
```
|
|
|
|
**After (derived from pipeline, accurate):**
|
|
```python
|
|
# scripts/generate_blog_assets.py
|
|
from analysis.political_axis import compute_svd_spectrum
|
|
|
|
evr = compute_svd_spectrum(window_ids=["current_parliament"])
|
|
# evr[0] = 0.290, evr[1] = 0.1146 → PC1~29%, PC2~11.5%, total~41%
|
|
```
|
|
```html
|
|
<p>PC1 explains ~29% of the variance and PC2 explains ~11.5% — together ~41%.</p>
|
|
```
|
|
|
|
**Multi-window EVR (Procrustes-aligned across all 41 windows):**
|
|
```python
|
|
evr_multi = compute_svd_spectrum() # no window_ids → all windows
|
|
# evr_multi[0] = 0.1463, evr_multi[1] = 0.1310
|
|
```
|
|
|
|
**Party agreement for a specific window:**
|
|
```python
|
|
import duckdb
|
|
con = duckdb.connect("data/motions.db")
|
|
# Agreement between two parties in a quarter
|
|
sql = """
|
|
SELECT AVG(CASE WHEN a.vote = b.vote THEN 1.0 ELSE 0.0 END)
|
|
FROM mp_votes a JOIN mp_votes b USING (motion_id)
|
|
WHERE a.party = 'GroenLinks' AND b.party = 'PvdA'
|
|
AND a.motion_id IN (SELECT id FROM motions WHERE window_id = '2023-Q3')
|
|
"""
|
|
```
|
|
|
|
## Related
|
|
|
|
- `docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.md` — companion guidance on keeping SVD axis *labels* aligned with voting data rather than semantic assumptions
|
|
|