--- date: 2026-04-04 topic: explorer-refactor --- # Explorer.py Refactor: Extract to analysis/ ## Problem Frame explorer.py is 3715 lines with 39 functions mixing: - Data loading (DuckDB queries) - Business logic (SVD projections, trajectory alignment) - UI rendering (Streamlit components) This makes the file: - Hard to navigate (no clear boundaries) - Hard to test (requires Streamlit + DuckDB) - Hard to review (changes affect everything) **Goal**: Improve navigability by extracting computation-heavy logic to `analysis/`, leaving explorer.py as a UI orchestration layer. ## Requirements ### Data Layer - **R1.1**: Create `analysis/explorer_data.py` containing all data loading functions currently in explorer.py: - `get_available_windows()` - `get_uniform_dim_windows()` - `load_positions()` - `load_party_map()` - `load_active_mps()` - `load_party_axis_scores()` - `load_party_scores_all_windows()` - `load_party_scores_all_windows_aligned()` - `load_party_mp_vectors()` - `load_scree_data()` - `load_motions_df()` - **R1.2**: All extracted functions must be callable without Streamlit imports (no `@st.cache_data`, no `st.*` calls) - **R1.3**: Functions return pure Python data structures (DataFrames, dicts, lists) - no Plotly figures ### Business Logic Layer - **R2.1**: Move computation functions to `analysis/` modules based on domain: - `_should_swap_axes()`, `_swap_axes()` → `analysis/axis_utils.py` (new) - `compute_party_discipline()` → `analysis/trajectories.py` - Trajectory computation functions → `analysis/trajectories.py` - SVD projection functions → `analysis/svd_labels.py` or new `analysis/projections.py` - **R2.2**: Computations must be pure functions (no IO, deterministic outputs) ### UI Layer (explorer.py) - **R3.1**: explorer.py becomes a thin orchestration layer: - Imports from `analysis/explorer_data.py` for data - Imports from `analysis/` modules for computations - Contains only Streamlit UI code and `@st.cache_data` wrappers - **R3.2**: Render functions (`_render_*`) stay in explorer.py (they're UI-only) - **R3.3**: Tab-building functions (`build_*_tab()`) stay in explorer.py but delegate to imported functions ### Import Safety - **R4.1**: New `analysis/` modules must not import from `explorer.py` (no circular dependencies) - **R4.2**: `analysis/explorer_data.py` may import from `database.py` (already exists) ### Testing - **R5.1**: Extracted data functions should be testable with mocked DuckDB connections - **R5.2**: Extracted computation functions should be pure and testable without database ## Success Criteria - explorer.py reduced to under 1500 lines (from 3715) - No function in explorer.py exceeds 100 lines - Clear module boundaries: data → computation → UI - All extracted functions have docstrings with type hints - No circular imports between `analysis/` and `explorer/` ## Scope Boundaries **Included:** - Data loading functions - Computation/transformation logic - Clear separation of concerns **Excluded:** - UI rendering functions (they can stay in explorer.py) - Database schema changes - New features or behavior changes - Test suite updates (handled separately) ## Key Decisions - **Domain-based splitting**: Computation goes to relevant `analysis/` module, not all to one file - **Import direction**: `explorer.py` imports from `analysis/`, never vice versa - **Preserve function signatures**: Refactoring shouldn't change public APIs ## Dependencies / Assumptions - `database.py` provides `MotionDatabase` singleton - data functions will use this - `explorer_helpers.py` pattern is already established - follow its conventions - Streamlit caching (`@st.cache_data`) stays in explorer.py as the orchestration layer ## Outstanding Questions ### Deferred to Planning - [ ] [Implementation] Should `_load_mp_vectors_by_party()` and variants be merged or kept separate? - [ ] [Implementation] Should we create `analysis/projections.py` or extend existing `analysis/axis_classifier.py`? - [ ] [Implementation] How to handle the `_cached_bootstrap_cis()` function - move to analysis or keep as cache wrapper? ## Next Steps → `/ce:plan` for structured implementation planning