You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/plans/2026-03-31-debug-trajectori...

14 KiB

Trajectory Plots Not Showing — Debugging Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Identify why trajectory plots are invisible or empty in the Streamlit Explorer UI, then fix the root cause.

Architecture: Systematic step-by-step pipeline trace from UI → DB. Each stage has explicit "what should I see" checkpoints so we can pinpoint exactly where data becomes invisible.

Tech Stack: Streamlit, Plotly, DuckDB, Python ≥3.13, uv


Debugging Pipeline (Stage-by-Stage Checkpoints)

┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 0: UI Layer — what does the user actually see?                  │
│  explorer.py → build_trajectories_tab()                                 │
│  → Is the tab visible? Empty chart? Error message? No chart at all?     │
└─────────────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 1: positions_by_window — are MP positions loaded?               │
│  load_positions(db_path, "annual")                                      │
│  → Expected: 12 windows, ~150-200 MPs per window                        │
│  → Check: _last_trajectories_diagnostics["stage"]                        │
└─────────────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 2: party_map — are MP→party mappings loaded?                    │
│  load_party_map(db_path)                                                │
│  → Expected: ~1036 entries                                              │
│  → Check: party_map is non-empty dict                                   │
└─────────────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 3: party centroids — are party means computed?                  │
│  compute_party_centroids() / compute_party_coords()                     │
│  → Expected: CDA, D66, VVD, PVV, SP, GroenLinks-PvdA centroids exist   │
│  → Check: plottable_parties > 0                                        │
└─────────────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 4: select_trajectory_plot_data — does it return traces?          │
│  → Expected: fig with 3-6 colored scatter traces, trace_count > 0       │
│  → Check: banner_text is None (no fallback), trace_count ≥ 3           │
└─────────────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────────────┐
│  STAGE 5: Plotly render — is the figure rendered in the browser?       │
│  st.plotly_chart(fig, use_container_width=True)                         │
│  → Expected: visible chart with colored party lines                     │
│  → Check: browser DOM, no JS errors                                     │
└─────────────────────────────────────────────────────────────────────────┘

Task 1: Instrument the app to print real-time pipeline state

Files:

  • Modify: explorer.py (add print statements at each stage)

  • Test: Run uv run streamlit run explorer.py with EXPLORER_DEBUG_TRAJECTORIES=1

  • Step 1: Add stage-0 checkpoint at top of build_trajectories_tab

Read explorer.py lines 1601-1650. Add a print statement at the start of build_trajectories_tab:

print(f"[TRAJ DEBUG] build_trajectories_tab called — db_path={db_path}, window_size={window_size}")
  • Step 2: Add stage-1 checkpoint after load_positions

Read explorer.py lines 1605-1610. After the call to load_positions, add:

positions_by_window, axis_def = load_positions(db_path, window_size)
print(f"[TRAJ DEBUG] load_positions → {len(positions_by_window)} windows, "
      f"total MPs={sum(len(v) for v in positions_by_window.values())}")
  • Step 3: Add stage-2 checkpoint after load_party_map

Read explorer.py lines 1638-1642. After the call to load_party_map, add:

party_map = load_party_map(db_path)
print(f"[TRAJ DEBUG] load_party_map → {len(party_map)} entries, "
      f"sample={list(party_map.items())[:3]}")
  • Step 4: Add stage-3 checkpoint after centroid computation

Read explorer.py lines 1641-1670. After the inline centroid loop, add:

all_parties = sorted(set(party_map.get(mp) for MPs in positions_by_window.values() for mp in MPs) - {None, "Unknown"})
print(f"[TRAJ DEBUG] all_parties (raw from party_map) → {len(all_parties)} parties: {all_parties[:10]}")
  • Step 5: Add stage-4 checkpoint before st.plotly_chart

Read explorer.py around line 2105. Before the st.plotly_chart call, add:

print(f"[TRAJ DEBUG] About to render plotly chart — trace_count={trace_count}, "
      f"banner={banner_text}, fig has {len(fig.data)} traces")
  • Step 6: Run the app and capture all debug output
EXPLORER_DEBUG_TRAJECTORIES=1 uv run streamlit run explorer.py 2>&1 | grep TRAJ

Expected output (all stages should print):

[TRAJ DEBUG] build_trajectories_tab called — db_path=..., window_size=annual
[TRAJ DEBUG] load_positions → 12 windows, total MPs=...
[TRAJ DEBUG] load_party_map → 1036 entries, sample=[(...), (...), (...) ]
[TRAJ DEBUG] all_parties (raw from party_map) → N parties: [...]
[TRAJ DEBUG] About to render plotly chart — trace_count=N, banner=None, fig has N traces

If any stage is missing or shows 0/empty, that's the bug location. Document which stage fails and proceed to the corresponding fix task.

  • Step 7: Commit
git add explorer.py
git commit -m "chore: add TRAJ DEBUG print checkpoints to build_trajectories_tab"

Task 2: Fix each failure mode

Based on Task 1 output, one of these will be the culprit:

Failure Mode A: positions_by_window is empty (Stage 1)

Symptom: load_positions → 0 windows

Root causes to check:

  • get_uniform_dim_windows returns [] (no dim-50 windows in DB)
  • compute_2d_axes silently fails on all windows
  • DB path is wrong or data/motions.db is missing

Fix:

  • Run: uv run python -c "from explorer import get_uniform_dim_windows; print(get_uniform_dim_windows('data/motions.db'))"
  • If empty: query DB directly — uv run duckdb data/motions.db "SELECT COUNT(*) FROM svd_vectors WHERE entity_type='mp'" and check dimension distribution
  • If compute_2d_axes fails: add try/except with print at explorer.py:584
  • If DB path wrong: fix run_app() to resolve relative path

Failure Mode B: party_map is empty (Stage 2)

Symptom: load_party_map → 0 entries

Root causes:

  • mp_metadata and mp_votes tables are empty or missing
  • DuckDB connection fails
  • DB path points to wrong file

Fix:

  • Run: uv run python -c "from analysis.visualize import _load_party_map; print(len(_load_party_map('data/motions.db')))"
  • If 0: query SELECT COUNT(*) FROM mp_metadata, SELECT COUNT(*) FROM mp_votes
  • If tables missing: run data pipeline to populate them
  • If DuckDB fails to import: check pip install duckdb in the uv environment

Failure Mode C: all_parties is empty (Stage 3)

Symptom: all_parties (raw from party_map) → 0 parties

Root causes:

  • All MP names in positions_by_window have no match in party_map (name mismatch)
  • Every MP maps to "Unknown" or None

Fix:

  • Run: uv run python -c "from explorer import load_positions, load_party_map; pw = load_positions('data/motions.db', 'annual')[0]; pm = load_party_map('data/motions.db'); sample_mps = list(pw[list(pw.keys())[0]].keys())[:5]; print({mp: pm.get(mp, 'NO MATCH') for mp in sample_mps})"
  • If name mismatches: investigate _strip_paren fallback logic in compute_party_coords (explorer_helpers.py:165-170)
  • If too many mismatches: add name normalization (strip titles, standardize suffixes)
  • Commit fix with test

Failure Mode D: trace_count == 0 (Stage 4)

Symptom: About to render plotly chart — trace_count=0 or banner != None

Root causes:

  • All party centroids are NaN (every MP position is NaN)
  • compute_party_coords filters out all parties (NaN/Inf in all positions)
  • select_trajectory_plot_data falls back to MP trajectories but MP fallback also fails

Fix:

  • Add debug print inside compute_party_coords: print(f"[TRAJ DEBUG] compute_party_coords window={window_id} → {len(party_coords)} parties: {list(party_coords.keys())[:5]}")
  • Check if NaN comes from compute_2d_axes output (PCA on svd_vectors)
  • Run: uv run python -c "from explorer import load_positions; pw = load_positions('data/motions.db', 'annual')[0]; win = list(pw.values())[0]; sample = list(win.items())[:3]; print({k: v for k, v in sample})" — if all values are (nan, nan), the PCA step is producing NaN
  • If PCA produces NaN: check analysis/political_axis.py:compute_2d_axes for the specific window's SVD vectors

Failure Mode E: Chart not visible in browser (Stage 5)

Symptom: All stages pass but chart is blank in browser

Root causes:

  • Plotly fig is empty (no traces added to figure)
  • Streamlit st.plotly_chart suppressed by CSS/JS error
  • Container width is 0 (layout issue)

Fix:

  • Add debug print: print(f"[TRAJ DEBUG] st.plotly_chart called with fig.data={[(t.mode, len(t.x), len(t.y)) for t in fig.data]}")
  • Check browser console for JavaScript errors (Plotly.js errors)
  • Check if use_container_width=True causes issues — try use_container_width=False
  • Add st.write(fig) as alternative to st.plotly_chart for debugging

Failure Mode F: All stages pass, chart still shows blank

Symptom: trace_count > 0 but chart looks empty to user

Root causes:

  • All traces are transparent/white-on-white
  • X/Y axes have huge range and all data is in a tiny corner
  • Party lines overlap completely (all parties at same position)

Fix:

  • Print axis ranges: print(f"[TRAJ DEBUG] xaxis range={[fig.layout.xaxis.range] if fig.layout.xaxis.range else 'auto'}, yaxis range={[fig.layout.yaxis.range] if fig.layout.yaxis.range else 'auto'}")
  • Check if centroids are all at (0, 0) — run: uv run python -c "from explorer import load_positions, load_party_map; from explorer_helpers import compute_party_coords; ..."
  • Check if PARTY_COLOURS assignment is broken (all traces same color)
  • Verify window ordering is correct (chronological left-to-right)

Task 3: Write regression test

Files:

  • Create: tests/test_trajectories_pipeline_integration.py

  • Step 1: Write integration test

"""Integration test: full trajectory pipeline produces non-empty plot."""
from explorer import load_positions, load_party_map
from explorer_helpers import compute_party_centroids
from explorer import select_trajectory_plot_data

def test_trajectory_pipeline_produces_traces():
    db_path = "data/motions.db"
    window_size = "annual"
    
    positions_by_window, _ = load_positions(db_path, window_size)
    party_map = load_party_map(db_path)
    windows = list(positions_by_window.keys())
    
    centroids, mp_positions = compute_party_centroids(positions_by_window, party_map, windows)
    fig, trace_count, banner = select_trajectory_plot_data(
        positions_by_window, party_map, windows,
        selected_parties=list(centroids.keys())[:6],
        smooth_alpha=0.35,
    )
    
    assert trace_count > 0, f"Expected traces but got trace_count={trace_count}, banner={banner}"
    assert banner is None, f"Expected no fallback banner but got: {banner}"
    assert len(fig.data) == trace_count
  • Step 2: Run the test
uv run pytest tests/test_trajectories_pipeline_integration.py -v

Expected: PASS

  • Step 3: Commit
git add tests/test_trajectories_pipeline_integration.py
git commit -m "test: add trajectory pipeline integration test"

Execution Order

  1. Task 1 first — Run the instrumented app and capture which stage fails
  2. Task 2 — Fix the specific failure mode based on Task 1 output
  3. Task 3 — Write regression test once the fix is confirmed

Estimated time: 15-30 minutes for Task 1 (identifying the stage), 10-30 minutes for Task 2 fix (depends on which mode), 5 minutes for Task 3.