You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/docs/brainstorms/2026-04-04-explorer-refacto...

4.1 KiB

date topic
2026-04-04 explorer-refactor

Explorer.py Refactor: Extract to analysis/

Problem Frame

explorer.py is 3715 lines with 39 functions mixing:

  • Data loading (DuckDB queries)
  • Business logic (SVD projections, trajectory alignment)
  • UI rendering (Streamlit components)

This makes the file:

  • Hard to navigate (no clear boundaries)
  • Hard to test (requires Streamlit + DuckDB)
  • Hard to review (changes affect everything)

Goal: Improve navigability by extracting computation-heavy logic to analysis/, leaving explorer.py as a UI orchestration layer.

Requirements

Data Layer

  • R1.1: Create analysis/explorer_data.py containing all data loading functions currently in explorer.py:

    • get_available_windows()
    • get_uniform_dim_windows()
    • load_positions()
    • load_party_map()
    • load_active_mps()
    • load_party_axis_scores()
    • load_party_scores_all_windows()
    • load_party_scores_all_windows_aligned()
    • load_party_mp_vectors()
    • load_scree_data()
    • load_motions_df()
  • R1.2: All extracted functions must be callable without Streamlit imports (no @st.cache_data, no st.* calls)

  • R1.3: Functions return pure Python data structures (DataFrames, dicts, lists) - no Plotly figures

Business Logic Layer

  • R2.1: Move computation functions to analysis/ modules based on domain:

    • _should_swap_axes(), _swap_axes()analysis/axis_utils.py (new)
    • compute_party_discipline()analysis/trajectories.py
    • Trajectory computation functions → analysis/trajectories.py
    • SVD projection functions → analysis/svd_labels.py or new analysis/projections.py
  • R2.2: Computations must be pure functions (no IO, deterministic outputs)

UI Layer (explorer.py)

  • R3.1: explorer.py becomes a thin orchestration layer:

    • Imports from analysis/explorer_data.py for data
    • Imports from analysis/ modules for computations
    • Contains only Streamlit UI code and @st.cache_data wrappers
  • R3.2: Render functions (_render_*) stay in explorer.py (they're UI-only)

  • R3.3: Tab-building functions (build_*_tab()) stay in explorer.py but delegate to imported functions

Import Safety

  • R4.1: New analysis/ modules must not import from explorer.py (no circular dependencies)

  • R4.2: analysis/explorer_data.py may import from database.py (already exists)

Testing

  • R5.1: Extracted data functions should be testable with mocked DuckDB connections

  • R5.2: Extracted computation functions should be pure and testable without database

Success Criteria

  • explorer.py reduced to under 1500 lines (from 3715)
  • No function in explorer.py exceeds 100 lines
  • Clear module boundaries: data → computation → UI
  • All extracted functions have docstrings with type hints
  • No circular imports between analysis/ and explorer/

Scope Boundaries

Included:

  • Data loading functions
  • Computation/transformation logic
  • Clear separation of concerns

Excluded:

  • UI rendering functions (they can stay in explorer.py)
  • Database schema changes
  • New features or behavior changes
  • Test suite updates (handled separately)

Key Decisions

  • Domain-based splitting: Computation goes to relevant analysis/ module, not all to one file
  • Import direction: explorer.py imports from analysis/, never vice versa
  • Preserve function signatures: Refactoring shouldn't change public APIs

Dependencies / Assumptions

  • database.py provides MotionDatabase singleton - data functions will use this
  • explorer_helpers.py pattern is already established - follow its conventions
  • Streamlit caching (@st.cache_data) stays in explorer.py as the orchestration layer

Outstanding Questions

Deferred to Planning

  • [Implementation] Should _load_mp_vectors_by_party() and variants be merged or kept separate?
  • [Implementation] Should we create analysis/projections.py or extend existing analysis/axis_classifier.py?
  • [Implementation] How to handle the _cached_bootstrap_cis() function - move to analysis or keep as cache wrapper?

Next Steps

/ce:plan for structured implementation planning