--- title: "refactor: Extract business logic from explorer.py to analysis/" type: refactor status: active date: 2026-04-04 origin: docs/brainstorms/2026-04-04-explorer-refactor-requirements.md --- # Refactor: Extract Business Logic from explorer.py to analysis/ ## Overview Split the 3715-line `explorer.py` into clear layers: data loading, business logic, and UI. This improves navigability and testability while preserving all existing behavior. ## Problem Frame `explorer.py` mixes three concerns (data loading, computation, UI) making it: - Hard to navigate — no clear boundaries - Hard to test — requires Streamlit + DuckDB - Hard to review — changes affect everything ## Requirements Trace - R1.1: Create `analysis/explorer_data.py` with data loading functions - R1.2: Data functions callable without Streamlit imports - R1.3: Functions return pure Python data structures - R2.1: Move computation to domain-appropriate `analysis/` modules - R2.2: Computations are pure functions - R3.1: explorer.py becomes thin orchestration layer - R3.2: `_render_*` functions stay in explorer.py - R3.3: `build_*_tab()` functions delegate to imported functions - R4.1: No circular imports - R5.1: Data functions testable with mocked DuckDB - R5.2: Computation functions pure and testable ## Key Technical Decisions - **Domain-based splitting**: Computation goes to relevant `analysis/` module - **Import direction**: `explorer.py` imports from `analysis/`, never vice versa - **Preserve signatures**: Refactoring doesn't change public APIs - **`_load_mp_vectors_by_party` variants**: Keep separate (serve different use cases) - **`analysis/projections.py`**: Create new file (distinct from axis_classifier.py) - **`_cached_bootstrap_cis()`**: Keep as cache wrapper in explorer.py, move computation to analysis/ ## Open Questions ### Resolved During Planning - **`_load_mp_vectors_by_party` variants**: Keep separate — they have different signatures and use cases - **`analysis/projections.py`**: Create new file — projections are distinct from axis classification - **`_cached_bootstrap_cis()`**: Keep wrapper in explorer.py, move computation to analysis/trajectories.py ### Deferred to Implementation - Exact function grouping within `analysis/explorer_data.py` — will be refined during extraction - Whether to add `__all__` exports — decide based on usage patterns after extraction ## Implementation Units - [ ] **Unit 1: Create `analysis/explorer_data.py` skeleton** **Goal:** Create the data loading module with extracted functions **Requirements:** R1.1, R1.2, R1.3 **Dependencies:** None **Files:** - Create: `analysis/explorer_data.py` **Approach:** 1. Create module with docstring and imports 2. Add stub functions with original signatures (no implementation) 3. Copy docstrings and type hints from explorer.py **Functions to extract:** - `get_available_windows(db_path: str) -> List[str]` - `get_uniform_dim_windows(db_path: str) -> List[str]` - `load_positions(db_path: str, window_size: str) -> pd.DataFrame` - `load_party_map(db_path: str) -> Dict[str, str]` - `load_active_mps(db_path: str) -> set` - `load_party_axis_scores(db_path: str) -> Dict[str, List[float]]` - `load_party_axis_scores_for_window(db_path: str, window: str) -> Dict[str, List[float]]` - `load_party_scores_all_windows(db_path: str) -> Dict[str, List[List[float]]]` - `load_party_scores_all_windows_aligned(db_path: str) -> Dict[str, List[List[float]]]` - `load_party_mp_vectors(db_path: str) -> Dict[str, List[np.ndarray]]` - `load_scree_data(db_path: str) -> List[float]` - `load_motions_df(db_path: str) -> pd.DataFrame` **Patterns to follow:** - `explorer_helpers.py` conventions (pure functions, no IO side effects) - `database.py` for DuckDB connection patterns **Verification:** - Module imports without errors - All functions have correct signatures --- - [ ] **Unit 2: Create `analysis/projections.py`** **Goal:** Create module for SVD projection and axis utilities **Requirements:** R2.1, R2.2 **Dependencies:** Unit 1 **Files:** - Create: `analysis/projections.py` **Approach:** 1. Extract `_should_swap_axes()` and `_swap_axes()` from explorer.py 2. Add pure projection computation functions **Functions to extract:** - `_should_swap_axes(axis_def: dict) -> bool` - `_swap_axes(axis_def: dict) -> dict` - `project_motions_onto_axis(motion_ids, scores) -> List[Tuple[int, float]]` (stub) **Patterns to follow:** - Pure function conventions from `explorer_helpers.py` **Verification:** - Functions work without Streamlit/DuckDB imports --- - [ ] **Unit 3: Update `analysis/trajectories.py`** **Goal:** Add trajectory computation functions from explorer.py **Requirements:** R2.1, R2.2 **Dependencies:** Unit 1 **Files:** - Modify: `analysis/trajectories.py` **Approach:** 1. Add `compute_party_discipline()` and related functions 2. Add `compute_trajectory_points()` (pure computation) **Functions to add:** - `compute_party_discipline(mp_scores: Dict[str, List[float]]) -> Dict[str, float]` - `compute_2d_trajectories(positions_by_window, party_axis_scores)` (stub) - `compute_aligned_trajectories(positions_by_window, party_scores_all)` (stub) **Verification:** - Functions are pure (no IO) - Existing trajectory.py tests pass --- - [ ] **Unit 4: Wire up imports in explorer.py** **Goal:** Update explorer.py to import from new modules **Requirements:** R3.1, R3.3, R4.1 **Dependencies:** Units 1, 2, 3 **Files:** - Modify: `explorer.py` **Approach:** 1. Replace local function definitions with imports 2. Keep wrapper functions where needed for `@st.cache_data` 3. Verify no circular imports **Verification:** - explorer.py imports work - No circular import errors - Streamlit app runs correctly --- - [ ] **Unit 5: Final cleanup and verification** **Goal:** Ensure explorer.py meets success criteria **Requirements:** All **Dependencies:** Unit 4 **Approach:** 1. Count lines in explorer.py — target under 1500 2. Check no function exceeds 100 lines 3. Verify all extracted functions have docstrings 4. Run existing tests **Verification:** - `wc -l explorer.py` < 1500 - All functions under 100 lines - Tests pass ## System-Wide Impact - **Interaction graph:** explorer.py imports from analysis/ — no reverse imports - **Error propagation:** Data functions raise exceptions on DB errors (same as before) - **API surface parity:** All function signatures preserved - **Unchanged invariants:** UI behavior identical, no new features ## Risks & Dependencies | Risk | Mitigation | |------|------------| | Breaking existing function signatures | Preserve exact signatures, update in place | | Circular imports | One-way import direction (explorer → analysis only) | | Regression in UI behavior | Test after each unit, verify Streamlit app runs | ## Documentation / Operational Notes - Update `ARCHITECTURE.md` to document new `analysis/explorer_data.py` module - No changes to deployment or configuration needed ## Sources & References - **Requirements doc:** `docs/brainstorms/2026-04-04-explorer-refactor-requirements.md` - Related code: `explorer.py`, `explorer_helpers.py`, `analysis/trajectories.py` - Pattern reference: `explorer_helpers.py` (pure function conventions)