docs: add diagnose-no-plot-trajectories design (2026-03-30)

1 month ago · 525cd157c0
parent ce1fc86bcb
commit 525cd157c0
1 changed files with 96 additions and 0 deletions
--- a/thoughts/shared/designs/2026-03-30-diagnose-no-plot-trajectories-design.md
+++ b/thoughts/shared/designs/2026-03-30-diagnose-no-plot-trajectories-design.md
@ -0,0 +1,96 @@
+---
+date: 2026-03-30
+topic: "diagnose-no-plot-trajectories"
+status: draft
+---
+
+## Problem Statement
+
+The Trajectories tab currently shows **no Plotly chart at all** (not just an empty chart). We need a low-risk way to determine exactly which runtime gate or swallowed exception is preventing any plot from being rendered and fix it so the chart appears or we surface a clear error message.
+
+**Key observation:** upstream code contains multiple early-returns (no data), and broad except/pass handlers that can silently swallow exceptions — either can cause the UI to skip calling st.plotly_chart entirely.
+
+## Constraints
+
+- Keep changes small and reversible.
+- Do not change user-facing defaults unless gated by an explicit debug toggle or environment variable.
+- Prefer adding diagnostics and logging over big refactors; short-term changes must be removable after diagnosis.
+- Preserve public function locations and names used by other code/tests.
+
+## Chosen approach (what I'll do)
+
+I'm choosing a focused instrumentation strategy: add a temporary, opt-in **debug mode** that surfaces the exact runtime decisions and any exceptions taken along the Trajectories rendering path, and un-silence key broad excepts so we can observe stack traces.
+
+**Why:** It's the fastest, lowest-risk way to get definitive evidence of why the plot doesn't render, and it avoids changing production logic except under an explicit debug toggle.
+
+**High-level changes:**
+- Add a **DEBUG toggle** (UI checkbox + env var EXPLORER_DEBUG_TRAJECTORIES) that enables verbose diagnostics in the Trajectories UI.
+- When debug is enabled, show step-by-step status for each early-return gate: result of load_positions, axis_def presence, length of positions_by_window, centroids size, mp_positions size, helper returns (fig/trace_count) and any exception tracebacks.
+- Replace the helper-call swallow (`except Exception: pass`) around select_trajectory_plot_data with a handler that logs and displays the exception (only when debug is enabled) and increments a visible diagnostic counter.
+- Add compact, structured diagnostics to the existing DEBUG expander (windows_count, party_map_count, centroids_sample, mp_positions_sample, helper_trace_count, helper_exception_string).
+
+## Alternatives considered (brief)
+
+1. Force-show MP fallback unconditionally. Pros: quickly confirm plotting plumbing works. Cons: noisy, may mask root cause and changes production behaviour.
+2. Heavy refactor to move pure plotting logic into an import-safe separate module and run offline tests. Pros: clean separation and easier tests. Cons: slower and higher-risk for this urgent diagnosis.
+
+I rejected both for immediate work because they are heavier than necessary to learn the root cause.
+
+## Architecture (where changes live)
+
+- Explorer UI (explorer.py) — add debug checkbox and diagnostic panel wiring inside build_trajectories_tab.
+- Diagnostics collector (small helper in explorer_helpers.py or local helper) — produce structured status dicts (counts, samples) used by the UI.
+- Error surfacer — modify the select_trajectory_plot_data call-site to log exceptions (logger.exception) and, when debug enabled, call st.exception(...) or st.text_area(...) with the traceback.
+
+## Components and responsibilities
+
+- **Debug toggle UI:** checkbox + env var binding; enables/disables verbose diagnostics.
+- **Diagnostic collector:** pure helper that inspects positions_by_window, party_map, centroids, mp_positions and returns compact samples and counts.
+- **Exception handler change:** convert broad `except: pass` at the helper boundary into `except Exception as e: logger.exception(e); diagnostic['select_helper_exception']=traceback; if debug: st.exception(e)`.
+- **Temporary UX:** display a compact, clearly labeled diagnostics block inside the DEBUG expander. Make it obvious this is a temporary troubleshooting aid.
+
+## Data flow (quick)
+
+- load_positions(db) -> positions_by_window, axis_def
+- diagnostic collector inspects positions_by_window and party_map
+- build_trajectories_tab calls select_trajectory_plot_data(...) inside a try/except
+- on success: use returned fig and trace_count to decide whether to call st.plotly_chart
+- on exception: diagnostic collector records traceback and UI shows it if debug enabled
+
+## Error handling strategy
+
+- Do not swallow exceptions silently at the helper boundary. Always log with logger.exception(...).
+- Only surface full tracebacks to the Streamlit UI when **debug mode** is enabled.
+- Keep production behaviour unchanged when debug mode is off.
+
+## Testing approach
+
+- Unit tests for the diagnostic collector with synthetic positions_by_window covering: empty data, partial centroids, and full centroids.
+- Unit test that simulates the helper raising an exception (monkeypatch) and asserts that the exception is logged and (when debug enabled) that the diagnostics struct contains the exception string.
+- Manual reproduction: run Streamlit locally with EXPLORER_DEBUG_TRAJECTORIES=1 and the same DB used in production to capture the diagnostics panel and fix the underlying issue.
+
+## Open questions
+
+- Can you reproduce the issue locally (same DB and same command to start Streamlit)? I assume yes and will base debug advice on that.
+- Are we allowed to enable a short-lived debug toggle in production logs if needed, or will you only run this locally?
+
+---
+
+I'm proceeding to create the design doc. Interrupt if you want changes.
+\n+## Environment management (use uv, not pip)
+
+We will not use pip directly. Use the project's `uv` tool to manage dependencies and run scripts so the environment is reproducible and follows local project conventions.
+
+Recommended commands:
+
+- Add duckdb to the project virtual environment:
+  - `uv add duckdb`
+- Run the diagnostic CLI with debug enabled:
+  - `EXPLORER_DEBUG_TRAJECTORIES=1 uv run python scripts/diagnose_trajectories_cli.py`
+- Start Streamlit inside the uv-managed environment (example):
+  - `uv run streamlit run pages/2_Explorer.py`
+
+Notes:
+
+- If the planner or any follow-up steps need to install or run packages, they should use `uv add` and `uv run` rather than `pip install` or direct interpreter calls.
+- If `uv` is not on PATH in a particular environment, prefer `python -m uv` or consult the project README/ARCHITECTURE.md for local developer environment instructions.