You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
220 lines
7.1 KiB
220 lines
7.1 KiB
---
|
|
title: "refactor: Extract business logic from explorer.py to analysis/"
|
|
type: refactor
|
|
status: active
|
|
date: 2026-04-04
|
|
origin: docs/brainstorms/2026-04-04-explorer-refactor-requirements.md
|
|
---
|
|
|
|
# Refactor: Extract Business Logic from explorer.py to analysis/
|
|
|
|
## Overview
|
|
|
|
Split the 3715-line `explorer.py` into clear layers: data loading, business logic, and UI. This improves navigability and testability while preserving all existing behavior.
|
|
|
|
## Problem Frame
|
|
|
|
`explorer.py` mixes three concerns (data loading, computation, UI) making it:
|
|
- Hard to navigate — no clear boundaries
|
|
- Hard to test — requires Streamlit + DuckDB
|
|
- Hard to review — changes affect everything
|
|
|
|
## Requirements Trace
|
|
|
|
- R1.1: Create `analysis/explorer_data.py` with data loading functions
|
|
- R1.2: Data functions callable without Streamlit imports
|
|
- R1.3: Functions return pure Python data structures
|
|
- R2.1: Move computation to domain-appropriate `analysis/` modules
|
|
- R2.2: Computations are pure functions
|
|
- R3.1: explorer.py becomes thin orchestration layer
|
|
- R3.2: `_render_*` functions stay in explorer.py
|
|
- R3.3: `build_*_tab()` functions delegate to imported functions
|
|
- R4.1: No circular imports
|
|
- R5.1: Data functions testable with mocked DuckDB
|
|
- R5.2: Computation functions pure and testable
|
|
|
|
## Key Technical Decisions
|
|
|
|
- **Domain-based splitting**: Computation goes to relevant `analysis/` module
|
|
- **Import direction**: `explorer.py` imports from `analysis/`, never vice versa
|
|
- **Preserve signatures**: Refactoring doesn't change public APIs
|
|
- **`_load_mp_vectors_by_party` variants**: Keep separate (serve different use cases)
|
|
- **`analysis/projections.py`**: Create new file (distinct from axis_classifier.py)
|
|
- **`_cached_bootstrap_cis()`**: Keep as cache wrapper in explorer.py, move computation to analysis/
|
|
|
|
## Open Questions
|
|
|
|
### Resolved During Planning
|
|
|
|
- **`_load_mp_vectors_by_party` variants**: Keep separate — they have different signatures and use cases
|
|
- **`analysis/projections.py`**: Create new file — projections are distinct from axis classification
|
|
- **`_cached_bootstrap_cis()`**: Keep wrapper in explorer.py, move computation to analysis/trajectories.py
|
|
|
|
### Deferred to Implementation
|
|
|
|
- Exact function grouping within `analysis/explorer_data.py` — will be refined during extraction
|
|
- Whether to add `__all__` exports — decide based on usage patterns after extraction
|
|
|
|
## Implementation Units
|
|
|
|
- [ ] **Unit 1: Create `analysis/explorer_data.py` skeleton**
|
|
|
|
**Goal:** Create the data loading module with extracted functions
|
|
|
|
**Requirements:** R1.1, R1.2, R1.3
|
|
|
|
**Dependencies:** None
|
|
|
|
**Files:**
|
|
- Create: `analysis/explorer_data.py`
|
|
|
|
**Approach:**
|
|
1. Create module with docstring and imports
|
|
2. Add stub functions with original signatures (no implementation)
|
|
3. Copy docstrings and type hints from explorer.py
|
|
|
|
**Functions to extract:**
|
|
- `get_available_windows(db_path: str) -> List[str]`
|
|
- `get_uniform_dim_windows(db_path: str) -> List[str]`
|
|
- `load_positions(db_path: str, window_size: str) -> pd.DataFrame`
|
|
- `load_party_map(db_path: str) -> Dict[str, str]`
|
|
- `load_active_mps(db_path: str) -> set`
|
|
- `load_party_axis_scores(db_path: str) -> Dict[str, List[float]]`
|
|
- `load_party_axis_scores_for_window(db_path: str, window: str) -> Dict[str, List[float]]`
|
|
- `load_party_scores_all_windows(db_path: str) -> Dict[str, List[List[float]]]`
|
|
- `load_party_scores_all_windows_aligned(db_path: str) -> Dict[str, List[List[float]]]`
|
|
- `load_party_mp_vectors(db_path: str) -> Dict[str, List[np.ndarray]]`
|
|
- `load_scree_data(db_path: str) -> List[float]`
|
|
- `load_motions_df(db_path: str) -> pd.DataFrame`
|
|
|
|
**Patterns to follow:**
|
|
- `explorer_helpers.py` conventions (pure functions, no IO side effects)
|
|
- `database.py` for DuckDB connection patterns
|
|
|
|
**Verification:**
|
|
- Module imports without errors
|
|
- All functions have correct signatures
|
|
|
|
---
|
|
|
|
- [ ] **Unit 2: Create `analysis/projections.py`**
|
|
|
|
**Goal:** Create module for SVD projection and axis utilities
|
|
|
|
**Requirements:** R2.1, R2.2
|
|
|
|
**Dependencies:** Unit 1
|
|
|
|
**Files:**
|
|
- Create: `analysis/projections.py`
|
|
|
|
**Approach:**
|
|
1. Extract `_should_swap_axes()` and `_swap_axes()` from explorer.py
|
|
2. Add pure projection computation functions
|
|
|
|
**Functions to extract:**
|
|
- `_should_swap_axes(axis_def: dict) -> bool`
|
|
- `_swap_axes(axis_def: dict) -> dict`
|
|
- `project_motions_onto_axis(motion_ids, scores) -> List[Tuple[int, float]]` (stub)
|
|
|
|
**Patterns to follow:**
|
|
- Pure function conventions from `explorer_helpers.py`
|
|
|
|
**Verification:**
|
|
- Functions work without Streamlit/DuckDB imports
|
|
|
|
---
|
|
|
|
- [ ] **Unit 3: Update `analysis/trajectories.py`**
|
|
|
|
**Goal:** Add trajectory computation functions from explorer.py
|
|
|
|
**Requirements:** R2.1, R2.2
|
|
|
|
**Dependencies:** Unit 1
|
|
|
|
**Files:**
|
|
- Modify: `analysis/trajectories.py`
|
|
|
|
**Approach:**
|
|
1. Add `compute_party_discipline()` and related functions
|
|
2. Add `compute_trajectory_points()` (pure computation)
|
|
|
|
**Functions to add:**
|
|
- `compute_party_discipline(mp_scores: Dict[str, List[float]]) -> Dict[str, float]`
|
|
- `compute_2d_trajectories(positions_by_window, party_axis_scores)` (stub)
|
|
- `compute_aligned_trajectories(positions_by_window, party_scores_all)` (stub)
|
|
|
|
**Verification:**
|
|
- Functions are pure (no IO)
|
|
- Existing trajectory.py tests pass
|
|
|
|
---
|
|
|
|
- [ ] **Unit 4: Wire up imports in explorer.py**
|
|
|
|
**Goal:** Update explorer.py to import from new modules
|
|
|
|
**Requirements:** R3.1, R3.3, R4.1
|
|
|
|
**Dependencies:** Units 1, 2, 3
|
|
|
|
**Files:**
|
|
- Modify: `explorer.py`
|
|
|
|
**Approach:**
|
|
1. Replace local function definitions with imports
|
|
2. Keep wrapper functions where needed for `@st.cache_data`
|
|
3. Verify no circular imports
|
|
|
|
**Verification:**
|
|
- explorer.py imports work
|
|
- No circular import errors
|
|
- Streamlit app runs correctly
|
|
|
|
---
|
|
|
|
- [ ] **Unit 5: Final cleanup and verification**
|
|
|
|
**Goal:** Ensure explorer.py meets success criteria
|
|
|
|
**Requirements:** All
|
|
|
|
**Dependencies:** Unit 4
|
|
|
|
**Approach:**
|
|
1. Count lines in explorer.py — target under 1500
|
|
2. Check no function exceeds 100 lines
|
|
3. Verify all extracted functions have docstrings
|
|
4. Run existing tests
|
|
|
|
**Verification:**
|
|
- `wc -l explorer.py` < 1500
|
|
- All functions under 100 lines
|
|
- Tests pass
|
|
|
|
## System-Wide Impact
|
|
|
|
- **Interaction graph:** explorer.py imports from analysis/ — no reverse imports
|
|
- **Error propagation:** Data functions raise exceptions on DB errors (same as before)
|
|
- **API surface parity:** All function signatures preserved
|
|
- **Unchanged invariants:** UI behavior identical, no new features
|
|
|
|
## Risks & Dependencies
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| Breaking existing function signatures | Preserve exact signatures, update in place |
|
|
| Circular imports | One-way import direction (explorer → analysis only) |
|
|
| Regression in UI behavior | Test after each unit, verify Streamlit app runs |
|
|
|
|
## Documentation / Operational Notes
|
|
|
|
- Update `ARCHITECTURE.md` to document new `analysis/explorer_data.py` module
|
|
- No changes to deployment or configuration needed
|
|
|
|
## Sources & References
|
|
|
|
- **Requirements doc:** `docs/brainstorms/2026-04-04-explorer-refactor-requirements.md`
|
|
- Related code: `explorer.py`, `explorer_helpers.py`, `analysis/trajectories.py`
|
|
- Pattern reference: `explorer_helpers.py` (pure function conventions)
|
|
|