parent
9f98dbae60
commit
1a83f0f319
@ -0,0 +1,288 @@ |
||||
# Trajectory Plots Not Showing — Debugging Plan |
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. |
||||
|
||||
**Goal:** Identify why trajectory plots are invisible or empty in the Streamlit Explorer UI, then fix the root cause. |
||||
|
||||
**Architecture:** Systematic step-by-step pipeline trace from UI → DB. Each stage has explicit "what should I see" checkpoints so we can pinpoint exactly where data becomes invisible. |
||||
|
||||
**Tech Stack:** Streamlit, Plotly, DuckDB, Python ≥3.13, uv |
||||
|
||||
--- |
||||
|
||||
## Debugging Pipeline (Stage-by-Stage Checkpoints) |
||||
|
||||
``` |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 0: UI Layer — what does the user actually see? │ |
||||
│ explorer.py → build_trajectories_tab() │ |
||||
│ → Is the tab visible? Empty chart? Error message? No chart at all? │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
↓ |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 1: positions_by_window — are MP positions loaded? │ |
||||
│ load_positions(db_path, "annual") │ |
||||
│ → Expected: 12 windows, ~150-200 MPs per window │ |
||||
│ → Check: _last_trajectories_diagnostics["stage"] │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
↓ |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 2: party_map — are MP→party mappings loaded? │ |
||||
│ load_party_map(db_path) │ |
||||
│ → Expected: ~1036 entries │ |
||||
│ → Check: party_map is non-empty dict │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
↓ |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 3: party centroids — are party means computed? │ |
||||
│ compute_party_centroids() / compute_party_coords() │ |
||||
│ → Expected: CDA, D66, VVD, PVV, SP, GroenLinks-PvdA centroids exist │ |
||||
│ → Check: plottable_parties > 0 │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
↓ |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 4: select_trajectory_plot_data — does it return traces? │ |
||||
│ → Expected: fig with 3-6 colored scatter traces, trace_count > 0 │ |
||||
│ → Check: banner_text is None (no fallback), trace_count ≥ 3 │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
↓ |
||||
┌─────────────────────────────────────────────────────────────────────────┐ |
||||
│ STAGE 5: Plotly render — is the figure rendered in the browser? │ |
||||
│ st.plotly_chart(fig, use_container_width=True) │ |
||||
│ → Expected: visible chart with colored party lines │ |
||||
│ → Check: browser DOM, no JS errors │ |
||||
└─────────────────────────────────────────────────────────────────────────┘ |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## Task 1: Instrument the app to print real-time pipeline state |
||||
|
||||
**Files:** |
||||
- Modify: `explorer.py` (add print statements at each stage) |
||||
- Test: Run `uv run streamlit run explorer.py` with `EXPLORER_DEBUG_TRAJECTORIES=1` |
||||
|
||||
- [ ] **Step 1: Add stage-0 checkpoint at top of `build_trajectories_tab`** |
||||
|
||||
Read `explorer.py` lines 1601-1650. Add a print statement at the start of `build_trajectories_tab`: |
||||
|
||||
```python |
||||
print(f"[TRAJ DEBUG] build_trajectories_tab called — db_path={db_path}, window_size={window_size}") |
||||
``` |
||||
|
||||
- [ ] **Step 2: Add stage-1 checkpoint after `load_positions`** |
||||
|
||||
Read `explorer.py` lines 1605-1610. After the call to `load_positions`, add: |
||||
|
||||
```python |
||||
positions_by_window, axis_def = load_positions(db_path, window_size) |
||||
print(f"[TRAJ DEBUG] load_positions → {len(positions_by_window)} windows, " |
||||
f"total MPs={sum(len(v) for v in positions_by_window.values())}") |
||||
``` |
||||
|
||||
- [ ] **Step 3: Add stage-2 checkpoint after `load_party_map`** |
||||
|
||||
Read `explorer.py` lines 1638-1642. After the call to `load_party_map`, add: |
||||
|
||||
```python |
||||
party_map = load_party_map(db_path) |
||||
print(f"[TRAJ DEBUG] load_party_map → {len(party_map)} entries, " |
||||
f"sample={list(party_map.items())[:3]}") |
||||
``` |
||||
|
||||
- [ ] **Step 4: Add stage-3 checkpoint after centroid computation** |
||||
|
||||
Read `explorer.py` lines 1641-1670. After the inline centroid loop, add: |
||||
|
||||
```python |
||||
all_parties = sorted(set(party_map.get(mp) for MPs in positions_by_window.values() for mp in MPs) - {None, "Unknown"}) |
||||
print(f"[TRAJ DEBUG] all_parties (raw from party_map) → {len(all_parties)} parties: {all_parties[:10]}") |
||||
``` |
||||
|
||||
- [ ] **Step 5: Add stage-4 checkpoint before `st.plotly_chart`** |
||||
|
||||
Read `explorer.py` around line 2105. Before the `st.plotly_chart` call, add: |
||||
|
||||
```python |
||||
print(f"[TRAJ DEBUG] About to render plotly chart — trace_count={trace_count}, " |
||||
f"banner={banner_text}, fig has {len(fig.data)} traces") |
||||
``` |
||||
|
||||
- [ ] **Step 6: Run the app and capture all debug output** |
||||
|
||||
```bash |
||||
EXPLORER_DEBUG_TRAJECTORIES=1 uv run streamlit run explorer.py 2>&1 | grep TRAJ |
||||
``` |
||||
|
||||
Expected output (all stages should print): |
||||
``` |
||||
[TRAJ DEBUG] build_trajectories_tab called — db_path=..., window_size=annual |
||||
[TRAJ DEBUG] load_positions → 12 windows, total MPs=... |
||||
[TRAJ DEBUG] load_party_map → 1036 entries, sample=[(...), (...), (...) ] |
||||
[TRAJ DEBUG] all_parties (raw from party_map) → N parties: [...] |
||||
[TRAJ DEBUG] About to render plotly chart — trace_count=N, banner=None, fig has N traces |
||||
``` |
||||
|
||||
**If any stage is missing or shows 0/empty, that's the bug location. Document which stage fails and proceed to the corresponding fix task.** |
||||
|
||||
- [ ] **Step 7: Commit** |
||||
|
||||
```bash |
||||
git add explorer.py |
||||
git commit -m "chore: add TRAJ DEBUG print checkpoints to build_trajectories_tab" |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## Task 2: Fix each failure mode |
||||
|
||||
Based on Task 1 output, one of these will be the culprit: |
||||
|
||||
### Failure Mode A: `positions_by_window` is empty (Stage 1) |
||||
|
||||
**Symptom:** `load_positions → 0 windows` |
||||
|
||||
**Root causes to check:** |
||||
- `get_uniform_dim_windows` returns `[]` (no dim-50 windows in DB) |
||||
- `compute_2d_axes` silently fails on all windows |
||||
- DB path is wrong or `data/motions.db` is missing |
||||
|
||||
**Fix:** |
||||
- [ ] Run: `uv run python -c "from explorer import get_uniform_dim_windows; print(get_uniform_dim_windows('data/motions.db'))"` |
||||
- [ ] If empty: query DB directly — `uv run duckdb data/motions.db "SELECT COUNT(*) FROM svd_vectors WHERE entity_type='mp'"` and check dimension distribution |
||||
- [ ] If `compute_2d_axes` fails: add try/except with print at `explorer.py:584` |
||||
- [ ] If DB path wrong: fix `run_app()` to resolve relative path |
||||
|
||||
### Failure Mode B: `party_map` is empty (Stage 2) |
||||
|
||||
**Symptom:** `load_party_map → 0 entries` |
||||
|
||||
**Root causes:** |
||||
- `mp_metadata` and `mp_votes` tables are empty or missing |
||||
- DuckDB connection fails |
||||
- DB path points to wrong file |
||||
|
||||
**Fix:** |
||||
- [ ] Run: `uv run python -c "from analysis.visualize import _load_party_map; print(len(_load_party_map('data/motions.db')))"` |
||||
- [ ] If 0: query `SELECT COUNT(*) FROM mp_metadata`, `SELECT COUNT(*) FROM mp_votes` |
||||
- [ ] If tables missing: run data pipeline to populate them |
||||
- [ ] If DuckDB fails to import: check `pip install duckdb` in the uv environment |
||||
|
||||
### Failure Mode C: `all_parties` is empty (Stage 3) |
||||
|
||||
**Symptom:** `all_parties (raw from party_map) → 0 parties` |
||||
|
||||
**Root causes:** |
||||
- All MP names in `positions_by_window` have no match in `party_map` (name mismatch) |
||||
- Every MP maps to `"Unknown"` or `None` |
||||
|
||||
**Fix:** |
||||
- [ ] Run: `uv run python -c "from explorer import load_positions, load_party_map; pw = load_positions('data/motions.db', 'annual')[0]; pm = load_party_map('data/motions.db'); sample_mps = list(pw[list(pw.keys())[0]].keys())[:5]; print({mp: pm.get(mp, 'NO MATCH') for mp in sample_mps})"` |
||||
- [ ] If name mismatches: investigate `_strip_paren` fallback logic in `compute_party_coords` (explorer_helpers.py:165-170) |
||||
- [ ] If too many mismatches: add name normalization (strip titles, standardize suffixes) |
||||
- [ ] Commit fix with test |
||||
|
||||
### Failure Mode D: `trace_count == 0` (Stage 4) |
||||
|
||||
**Symptom:** `About to render plotly chart — trace_count=0` or `banner != None` |
||||
|
||||
**Root causes:** |
||||
- All party centroids are NaN (every MP position is NaN) |
||||
- `compute_party_coords` filters out all parties (NaN/Inf in all positions) |
||||
- `select_trajectory_plot_data` falls back to MP trajectories but MP fallback also fails |
||||
|
||||
**Fix:** |
||||
- [ ] Add debug print inside `compute_party_coords`: `print(f"[TRAJ DEBUG] compute_party_coords window={window_id} → {len(party_coords)} parties: {list(party_coords.keys())[:5]}")` |
||||
- [ ] Check if NaN comes from `compute_2d_axes` output (PCA on svd_vectors) |
||||
- [ ] Run: `uv run python -c "from explorer import load_positions; pw = load_positions('data/motions.db', 'annual')[0]; win = list(pw.values())[0]; sample = list(win.items())[:3]; print({k: v for k, v in sample})"` — if all values are `(nan, nan)`, the PCA step is producing NaN |
||||
- [ ] If PCA produces NaN: check `analysis/political_axis.py:compute_2d_axes` for the specific window's SVD vectors |
||||
|
||||
### Failure Mode E: Chart not visible in browser (Stage 5) |
||||
|
||||
**Symptom:** All stages pass but chart is blank in browser |
||||
|
||||
**Root causes:** |
||||
- Plotly `fig` is empty (no traces added to figure) |
||||
- Streamlit `st.plotly_chart` suppressed by CSS/JS error |
||||
- Container width is 0 (layout issue) |
||||
|
||||
**Fix:** |
||||
- [ ] Add debug print: `print(f"[TRAJ DEBUG] st.plotly_chart called with fig.data={[(t.mode, len(t.x), len(t.y)) for t in fig.data]}")` |
||||
- [ ] Check browser console for JavaScript errors (Plotly.js errors) |
||||
- [ ] Check if `use_container_width=True` causes issues — try `use_container_width=False` |
||||
- [ ] Add `st.write(fig)` as alternative to `st.plotly_chart` for debugging |
||||
|
||||
### Failure Mode F: All stages pass, chart still shows blank |
||||
|
||||
**Symptom:** `trace_count > 0` but chart looks empty to user |
||||
|
||||
**Root causes:** |
||||
- All traces are transparent/white-on-white |
||||
- X/Y axes have huge range and all data is in a tiny corner |
||||
- Party lines overlap completely (all parties at same position) |
||||
|
||||
**Fix:** |
||||
- [ ] Print axis ranges: `print(f"[TRAJ DEBUG] xaxis range={[fig.layout.xaxis.range] if fig.layout.xaxis.range else 'auto'}, yaxis range={[fig.layout.yaxis.range] if fig.layout.yaxis.range else 'auto'}")` |
||||
- [ ] Check if centroids are all at `(0, 0)` — run: `uv run python -c "from explorer import load_positions, load_party_map; from explorer_helpers import compute_party_coords; ..."` |
||||
- [ ] Check if PARTY_COLOURS assignment is broken (all traces same color) |
||||
- [ ] Verify window ordering is correct (chronological left-to-right) |
||||
|
||||
--- |
||||
|
||||
## Task 3: Write regression test |
||||
|
||||
**Files:** |
||||
- Create: `tests/test_trajectories_pipeline_integration.py` |
||||
|
||||
- [ ] **Step 1: Write integration test** |
||||
|
||||
```python |
||||
"""Integration test: full trajectory pipeline produces non-empty plot.""" |
||||
from explorer import load_positions, load_party_map |
||||
from explorer_helpers import compute_party_centroids |
||||
from explorer import select_trajectory_plot_data |
||||
|
||||
def test_trajectory_pipeline_produces_traces(): |
||||
db_path = "data/motions.db" |
||||
window_size = "annual" |
||||
|
||||
positions_by_window, _ = load_positions(db_path, window_size) |
||||
party_map = load_party_map(db_path) |
||||
windows = list(positions_by_window.keys()) |
||||
|
||||
centroids, mp_positions = compute_party_centroids(positions_by_window, party_map, windows) |
||||
fig, trace_count, banner = select_trajectory_plot_data( |
||||
positions_by_window, party_map, windows, |
||||
selected_parties=list(centroids.keys())[:6], |
||||
smooth_alpha=0.35, |
||||
) |
||||
|
||||
assert trace_count > 0, f"Expected traces but got trace_count={trace_count}, banner={banner}" |
||||
assert banner is None, f"Expected no fallback banner but got: {banner}" |
||||
assert len(fig.data) == trace_count |
||||
``` |
||||
|
||||
- [ ] **Step 2: Run the test** |
||||
|
||||
```bash |
||||
uv run pytest tests/test_trajectories_pipeline_integration.py -v |
||||
``` |
||||
|
||||
Expected: PASS |
||||
|
||||
- [ ] **Step 3: Commit** |
||||
|
||||
```bash |
||||
git add tests/test_trajectories_pipeline_integration.py |
||||
git commit -m "test: add trajectory pipeline integration test" |
||||
``` |
||||
|
||||
--- |
||||
|
||||
## Execution Order |
||||
|
||||
1. **Task 1 first** — Run the instrumented app and capture which stage fails |
||||
2. **Task 2** — Fix the specific failure mode based on Task 1 output |
||||
3. **Task 3** — Write regression test once the fix is confirmed |
||||
|
||||
**Estimated time:** 15-30 minutes for Task 1 (identifying the stage), 10-30 minutes for Task 2 fix (depends on which mode), 5 minutes for Task 3. |
||||
Loading…
Reference in new issue