docs: add diagnose-no-plot-trajectories design (2026-03-30)

main
Sven Geboers 1 month ago
parent ce1fc86bcb
commit 525cd157c0
  1. 96
      thoughts/shared/designs/2026-03-30-diagnose-no-plot-trajectories-design.md

@ -0,0 +1,96 @@
---
date: 2026-03-30
topic: "diagnose-no-plot-trajectories"
status: draft
---
## Problem Statement
The Trajectories tab currently shows **no Plotly chart at all** (not just an empty chart). We need a low-risk way to determine exactly which runtime gate or swallowed exception is preventing any plot from being rendered and fix it so the chart appears or we surface a clear error message.
**Key observation:** upstream code contains multiple early-returns (no data), and broad except/pass handlers that can silently swallow exceptions — either can cause the UI to skip calling st.plotly_chart entirely.
## Constraints
- Keep changes small and reversible.
- Do not change user-facing defaults unless gated by an explicit debug toggle or environment variable.
- Prefer adding diagnostics and logging over big refactors; short-term changes must be removable after diagnosis.
- Preserve public function locations and names used by other code/tests.
## Chosen approach (what I'll do)
I'm choosing a focused instrumentation strategy: add a temporary, opt-in **debug mode** that surfaces the exact runtime decisions and any exceptions taken along the Trajectories rendering path, and un-silence key broad excepts so we can observe stack traces.
**Why:** It's the fastest, lowest-risk way to get definitive evidence of why the plot doesn't render, and it avoids changing production logic except under an explicit debug toggle.
**High-level changes:**
- Add a **DEBUG toggle** (UI checkbox + env var EXPLORER_DEBUG_TRAJECTORIES) that enables verbose diagnostics in the Trajectories UI.
- When debug is enabled, show step-by-step status for each early-return gate: result of load_positions, axis_def presence, length of positions_by_window, centroids size, mp_positions size, helper returns (fig/trace_count) and any exception tracebacks.
- Replace the helper-call swallow (`except Exception: pass`) around select_trajectory_plot_data with a handler that logs and displays the exception (only when debug is enabled) and increments a visible diagnostic counter.
- Add compact, structured diagnostics to the existing DEBUG expander (windows_count, party_map_count, centroids_sample, mp_positions_sample, helper_trace_count, helper_exception_string).
## Alternatives considered (brief)
1. Force-show MP fallback unconditionally. Pros: quickly confirm plotting plumbing works. Cons: noisy, may mask root cause and changes production behaviour.
2. Heavy refactor to move pure plotting logic into an import-safe separate module and run offline tests. Pros: clean separation and easier tests. Cons: slower and higher-risk for this urgent diagnosis.
I rejected both for immediate work because they are heavier than necessary to learn the root cause.
## Architecture (where changes live)
- Explorer UI (explorer.py) — add debug checkbox and diagnostic panel wiring inside build_trajectories_tab.
- Diagnostics collector (small helper in explorer_helpers.py or local helper) — produce structured status dicts (counts, samples) used by the UI.
- Error surfacer — modify the select_trajectory_plot_data call-site to log exceptions (logger.exception) and, when debug enabled, call st.exception(...) or st.text_area(...) with the traceback.
## Components and responsibilities
- **Debug toggle UI:** checkbox + env var binding; enables/disables verbose diagnostics.
- **Diagnostic collector:** pure helper that inspects positions_by_window, party_map, centroids, mp_positions and returns compact samples and counts.
- **Exception handler change:** convert broad `except: pass` at the helper boundary into `except Exception as e: logger.exception(e); diagnostic['select_helper_exception']=traceback; if debug: st.exception(e)`.
- **Temporary UX:** display a compact, clearly labeled diagnostics block inside the DEBUG expander. Make it obvious this is a temporary troubleshooting aid.
## Data flow (quick)
- load_positions(db) -> positions_by_window, axis_def
- diagnostic collector inspects positions_by_window and party_map
- build_trajectories_tab calls select_trajectory_plot_data(...) inside a try/except
- on success: use returned fig and trace_count to decide whether to call st.plotly_chart
- on exception: diagnostic collector records traceback and UI shows it if debug enabled
## Error handling strategy
- Do not swallow exceptions silently at the helper boundary. Always log with logger.exception(...).
- Only surface full tracebacks to the Streamlit UI when **debug mode** is enabled.
- Keep production behaviour unchanged when debug mode is off.
## Testing approach
- Unit tests for the diagnostic collector with synthetic positions_by_window covering: empty data, partial centroids, and full centroids.
- Unit test that simulates the helper raising an exception (monkeypatch) and asserts that the exception is logged and (when debug enabled) that the diagnostics struct contains the exception string.
- Manual reproduction: run Streamlit locally with EXPLORER_DEBUG_TRAJECTORIES=1 and the same DB used in production to capture the diagnostics panel and fix the underlying issue.
## Open questions
- Can you reproduce the issue locally (same DB and same command to start Streamlit)? I assume yes and will base debug advice on that.
- Are we allowed to enable a short-lived debug toggle in production logs if needed, or will you only run this locally?
---
I'm proceeding to create the design doc. Interrupt if you want changes.
\n+## Environment management (use uv, not pip)
We will not use pip directly. Use the project's `uv` tool to manage dependencies and run scripts so the environment is reproducible and follows local project conventions.
Recommended commands:
- Add duckdb to the project virtual environment:
- `uv add duckdb`
- Run the diagnostic CLI with debug enabled:
- `EXPLORER_DEBUG_TRAJECTORIES=1 uv run python scripts/diagnose_trajectories_cli.py`
- Start Streamlit inside the uv-managed environment (example):
- `uv run streamlit run pages/2_Explorer.py`
Notes:
- If the planner or any follow-up steps need to install or run packages, they should use `uv add` and `uv run` rather than `pip install` or direct interpreter calls.
- If `uv` is not on PATH in a particular environment, prefer `python -m uv` or consult the project README/ARCHITECTURE.md for local developer environment instructions.
Loading…
Cancel
Save