You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-04-04-002-refactor-exp...

220 lines
7.1 KiB

---
title: "refactor: Extract business logic from explorer.py to analysis/"
type: refactor
status: active
date: 2026-04-04
origin: docs/brainstorms/2026-04-04-explorer-refactor-requirements.md
---
# Refactor: Extract Business Logic from explorer.py to analysis/
## Overview
Split the 3715-line `explorer.py` into clear layers: data loading, business logic, and UI. This improves navigability and testability while preserving all existing behavior.
## Problem Frame
`explorer.py` mixes three concerns (data loading, computation, UI) making it:
- Hard to navigate — no clear boundaries
- Hard to test — requires Streamlit + DuckDB
- Hard to review — changes affect everything
## Requirements Trace
- R1.1: Create `analysis/explorer_data.py` with data loading functions
- R1.2: Data functions callable without Streamlit imports
- R1.3: Functions return pure Python data structures
- R2.1: Move computation to domain-appropriate `analysis/` modules
- R2.2: Computations are pure functions
- R3.1: explorer.py becomes thin orchestration layer
- R3.2: `_render_*` functions stay in explorer.py
- R3.3: `build_*_tab()` functions delegate to imported functions
- R4.1: No circular imports
- R5.1: Data functions testable with mocked DuckDB
- R5.2: Computation functions pure and testable
## Key Technical Decisions
- **Domain-based splitting**: Computation goes to relevant `analysis/` module
- **Import direction**: `explorer.py` imports from `analysis/`, never vice versa
- **Preserve signatures**: Refactoring doesn't change public APIs
- **`_load_mp_vectors_by_party` variants**: Keep separate (serve different use cases)
- **`analysis/projections.py`**: Create new file (distinct from axis_classifier.py)
- **`_cached_bootstrap_cis()`**: Keep as cache wrapper in explorer.py, move computation to analysis/
## Open Questions
### Resolved During Planning
- **`_load_mp_vectors_by_party` variants**: Keep separate — they have different signatures and use cases
- **`analysis/projections.py`**: Create new file — projections are distinct from axis classification
- **`_cached_bootstrap_cis()`**: Keep wrapper in explorer.py, move computation to analysis/trajectories.py
### Deferred to Implementation
- Exact function grouping within `analysis/explorer_data.py` — will be refined during extraction
- Whether to add `__all__` exports — decide based on usage patterns after extraction
## Implementation Units
- [ ] **Unit 1: Create `analysis/explorer_data.py` skeleton**
**Goal:** Create the data loading module with extracted functions
**Requirements:** R1.1, R1.2, R1.3
**Dependencies:** None
**Files:**
- Create: `analysis/explorer_data.py`
**Approach:**
1. Create module with docstring and imports
2. Add stub functions with original signatures (no implementation)
3. Copy docstrings and type hints from explorer.py
**Functions to extract:**
- `get_available_windows(db_path: str) -> List[str]`
- `get_uniform_dim_windows(db_path: str) -> List[str]`
- `load_positions(db_path: str, window_size: str) -> pd.DataFrame`
- `load_party_map(db_path: str) -> Dict[str, str]`
- `load_active_mps(db_path: str) -> set`
- `load_party_axis_scores(db_path: str) -> Dict[str, List[float]]`
- `load_party_axis_scores_for_window(db_path: str, window: str) -> Dict[str, List[float]]`
- `load_party_scores_all_windows(db_path: str) -> Dict[str, List[List[float]]]`
- `load_party_scores_all_windows_aligned(db_path: str) -> Dict[str, List[List[float]]]`
- `load_party_mp_vectors(db_path: str) -> Dict[str, List[np.ndarray]]`
- `load_scree_data(db_path: str) -> List[float]`
- `load_motions_df(db_path: str) -> pd.DataFrame`
**Patterns to follow:**
- `explorer_helpers.py` conventions (pure functions, no IO side effects)
- `database.py` for DuckDB connection patterns
**Verification:**
- Module imports without errors
- All functions have correct signatures
---
- [ ] **Unit 2: Create `analysis/projections.py`**
**Goal:** Create module for SVD projection and axis utilities
**Requirements:** R2.1, R2.2
**Dependencies:** Unit 1
**Files:**
- Create: `analysis/projections.py`
**Approach:**
1. Extract `_should_swap_axes()` and `_swap_axes()` from explorer.py
2. Add pure projection computation functions
**Functions to extract:**
- `_should_swap_axes(axis_def: dict) -> bool`
- `_swap_axes(axis_def: dict) -> dict`
- `project_motions_onto_axis(motion_ids, scores) -> List[Tuple[int, float]]` (stub)
**Patterns to follow:**
- Pure function conventions from `explorer_helpers.py`
**Verification:**
- Functions work without Streamlit/DuckDB imports
---
- [ ] **Unit 3: Update `analysis/trajectories.py`**
**Goal:** Add trajectory computation functions from explorer.py
**Requirements:** R2.1, R2.2
**Dependencies:** Unit 1
**Files:**
- Modify: `analysis/trajectories.py`
**Approach:**
1. Add `compute_party_discipline()` and related functions
2. Add `compute_trajectory_points()` (pure computation)
**Functions to add:**
- `compute_party_discipline(mp_scores: Dict[str, List[float]]) -> Dict[str, float]`
- `compute_2d_trajectories(positions_by_window, party_axis_scores)` (stub)
- `compute_aligned_trajectories(positions_by_window, party_scores_all)` (stub)
**Verification:**
- Functions are pure (no IO)
- Existing trajectory.py tests pass
---
- [ ] **Unit 4: Wire up imports in explorer.py**
**Goal:** Update explorer.py to import from new modules
**Requirements:** R3.1, R3.3, R4.1
**Dependencies:** Units 1, 2, 3
**Files:**
- Modify: `explorer.py`
**Approach:**
1. Replace local function definitions with imports
2. Keep wrapper functions where needed for `@st.cache_data`
3. Verify no circular imports
**Verification:**
- explorer.py imports work
- No circular import errors
- Streamlit app runs correctly
---
- [ ] **Unit 5: Final cleanup and verification**
**Goal:** Ensure explorer.py meets success criteria
**Requirements:** All
**Dependencies:** Unit 4
**Approach:**
1. Count lines in explorer.py — target under 1500
2. Check no function exceeds 100 lines
3. Verify all extracted functions have docstrings
4. Run existing tests
**Verification:**
- `wc -l explorer.py` < 1500
- All functions under 100 lines
- Tests pass
## System-Wide Impact
- **Interaction graph:** explorer.py imports from analysis/ no reverse imports
- **Error propagation:** Data functions raise exceptions on DB errors (same as before)
- **API surface parity:** All function signatures preserved
- **Unchanged invariants:** UI behavior identical, no new features
## Risks & Dependencies
| Risk | Mitigation |
|------|------------|
| Breaking existing function signatures | Preserve exact signatures, update in place |
| Circular imports | One-way import direction (explorer analysis only) |
| Regression in UI behavior | Test after each unit, verify Streamlit app runs |
## Documentation / Operational Notes
- Update `ARCHITECTURE.md` to document new `analysis/explorer_data.py` module
- No changes to deployment or configuration needed
## Sources & References
- **Requirements doc:** `docs/brainstorms/2026-04-04-explorer-refactor-requirements.md`
- Related code: `explorer.py`, `explorer_helpers.py`, `analysis/trajectories.py`
- Pattern reference: `explorer_helpers.py` (pure function conventions)