motief/docs/plans/2026-04-04-002-refactor-exp...

---
title: "refactor: Extract business logic from explorer.py to analysis/"
type: refactor
status: active
date: 2026-04-04
origin: docs/brainstorms/2026-04-04-explorer-refactor-requirements.md
---

# Refactor: Extract Business Logic from explorer.py to analysis/

## Overview

Split the 3715-line `explorer.py` into clear layers: data loading, business logic, and UI. This improves navigability and testability while preserving all existing behavior.

## Problem Frame

`explorer.py` mixes three concerns (data loading, computation, UI) making it:
- Hard to navigate — no clear boundaries
- Hard to test — requires Streamlit + DuckDB
- Hard to review — changes affect everything

## Requirements Trace

- R1.1: Create `analysis/explorer_data.py` with data loading functions
- R1.2: Data functions callable without Streamlit imports
- R1.3: Functions return pure Python data structures
- R2.1: Move computation to domain-appropriate `analysis/` modules
- R2.2: Computations are pure functions
- R3.1: explorer.py becomes thin orchestration layer
- R3.2: `_render_*` functions stay in explorer.py
- R3.3: `build_*_tab()` functions delegate to imported functions
- R4.1: No circular imports
- R5.1: Data functions testable with mocked DuckDB
- R5.2: Computation functions pure and testable

## Key Technical Decisions

- **Domain-based splitting**: Computation goes to relevant `analysis/` module
- **Import direction**: `explorer.py` imports from `analysis/`, never vice versa
- **Preserve signatures**: Refactoring doesn't change public APIs
- **`_load_mp_vectors_by_party` variants**: Keep separate (serve different use cases)
- **`analysis/projections.py`**: Create new file (distinct from axis_classifier.py)
- **`_cached_bootstrap_cis()`**: Keep as cache wrapper in explorer.py, move computation to analysis/

## Open Questions

### Resolved During Planning

- **`_load_mp_vectors_by_party` variants**: Keep separate — they have different signatures and use cases
- **`analysis/projections.py`**: Create new file — projections are distinct from axis classification
- **`_cached_bootstrap_cis()`**: Keep wrapper in explorer.py, move computation to analysis/trajectories.py

### Deferred to Implementation

- Exact function grouping within `analysis/explorer_data.py` — will be refined during extraction
- Whether to add `__all__` exports — decide based on usage patterns after extraction

## Implementation Units

- [ ] **Unit 1: Create `analysis/explorer_data.py` skeleton**

**Goal:** Create the data loading module with extracted functions

**Requirements:** R1.1, R1.2, R1.3

**Dependencies:** None

**Files:**
- Create: `analysis/explorer_data.py`

**Approach:**
1. Create module with docstring and imports
2. Add stub functions with original signatures (no implementation)
3. Copy docstrings and type hints from explorer.py

**Functions to extract:**
- `get_available_windows(db_path: str) -> List[str]`
- `get_uniform_dim_windows(db_path: str) -> List[str]`
- `load_positions(db_path: str, window_size: str) -> pd.DataFrame`
- `load_party_map(db_path: str) -> Dict[str, str]`
- `load_active_mps(db_path: str) -> set`
- `load_party_axis_scores(db_path: str) -> Dict[str, List[float]]`
- `load_party_axis_scores_for_window(db_path: str, window: str) -> Dict[str, List[float]]`
- `load_party_scores_all_windows(db_path: str) -> Dict[str, List[List[float]]]`
- `load_party_scores_all_windows_aligned(db_path: str) -> Dict[str, List[List[float]]]`
- `load_party_mp_vectors(db_path: str) -> Dict[str, List[np.ndarray]]`
- `load_scree_data(db_path: str) -> List[float]`
- `load_motions_df(db_path: str) -> pd.DataFrame`

**Patterns to follow:**
- `explorer_helpers.py` conventions (pure functions, no IO side effects)
- `database.py` for DuckDB connection patterns

**Verification:**
- Module imports without errors
- All functions have correct signatures

---

- [ ] **Unit 2: Create `analysis/projections.py`**

**Goal:** Create module for SVD projection and axis utilities

**Requirements:** R2.1, R2.2

**Dependencies:** Unit 1

**Files:**
- Create: `analysis/projections.py`

**Approach:**
1. Extract `_should_swap_axes()` and `_swap_axes()` from explorer.py
2. Add pure projection computation functions

**Functions to extract:**
- `_should_swap_axes(axis_def: dict) -> bool`
- `_swap_axes(axis_def: dict) -> dict`
- `project_motions_onto_axis(motion_ids, scores) -> List[Tuple[int, float]]` (stub)

**Patterns to follow:**
- Pure function conventions from `explorer_helpers.py`

**Verification:**
- Functions work without Streamlit/DuckDB imports

---

- [ ] **Unit 3: Update `analysis/trajectories.py`**

**Goal:** Add trajectory computation functions from explorer.py

**Requirements:** R2.1, R2.2

**Dependencies:** Unit 1

**Files:**
- Modify: `analysis/trajectories.py`

**Approach:**
1. Add `compute_party_discipline()` and related functions
2. Add `compute_trajectory_points()` (pure computation)

**Functions to add:**
- `compute_party_discipline(mp_scores: Dict[str, List[float]]) -> Dict[str, float]`
- `compute_2d_trajectories(positions_by_window, party_axis_scores)` (stub)
- `compute_aligned_trajectories(positions_by_window, party_scores_all)` (stub)

**Verification:**
- Functions are pure (no IO)
- Existing trajectory.py tests pass

---

- [ ] **Unit 4: Wire up imports in explorer.py**

**Goal:** Update explorer.py to import from new modules

**Requirements:** R3.1, R3.3, R4.1

**Dependencies:** Units 1, 2, 3

**Files:**
- Modify: `explorer.py`

**Approach:**
1. Replace local function definitions with imports
2. Keep wrapper functions where needed for `@st.cache_data`
3. Verify no circular imports

**Verification:**
- explorer.py imports work
- No circular import errors
- Streamlit app runs correctly

---

- [ ] **Unit 5: Final cleanup and verification**

**Goal:** Ensure explorer.py meets success criteria

**Requirements:** All

**Dependencies:** Unit 4

**Approach:**
1. Count lines in explorer.py — target under 1500
2. Check no function exceeds 100 lines
3. Verify all extracted functions have docstrings
4. Run existing tests

**Verification:**
- `wc -l explorer.py` < 1500
- All functions under 100 lines
- Tests pass

## System-Wide Impact

- **Interaction graph:** explorer.py imports from analysis/ — no reverse imports
- **Error propagation:** Data functions raise exceptions on DB errors (same as before)
- **API surface parity:** All function signatures preserved
- **Unchanged invariants:** UI behavior identical, no new features

## Risks & Dependencies

| Risk | Mitigation |
|------|------------|
| Breaking existing function signatures | Preserve exact signatures, update in place |
| Circular imports | One-way import direction (explorer → analysis only) |
| Regression in UI behavior | Test after each unit, verify Streamlit app runs |

## Documentation / Operational Notes

- Update `ARCHITECTURE.md` to document new `analysis/explorer_data.py` module
- No changes to deployment or configuration needed

## Sources & References

- **Requirements doc:** `docs/brainstorms/2026-04-04-explorer-refactor-requirements.md`
- Related code: `explorer.py`, `explorer_helpers.py`, `analysis/trajectories.py`
- Pattern reference: `explorer_helpers.py` (pure function conventions)