You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-04-05-001-make-modules...

4.9 KiB

Title: Make modules import-safe (duckdb/plotly)

Why

  • Enable lightweight unit tests and imports in environments without heavy runtime deps (duckdb, plotly) without changing runtime behaviour when those deps are present.

Scope

  • Primary focus: library modules that are commonly imported by tests or other modules (not CLI scripts that are only executed).
  • Initial rollout: small, reviewable batches. Do not push or change remote branches.

Non-goals

  • Remove duckdb/plotly dependency from runtime environments.
  • Refactor functionality beyond import-time safety.

Approach

  • Two safe patterns (apply conservatively):
    1. Pattern A — module-level guard

      try:
          import duckdb
      except Exception:  # pragma: no cover
          duckdb = None  # type: ignore
      

      Use when multiple functions in the module call duckdb and adding the guard is least invasive.

    2. Pattern B — function-local import (preferred for DB helpers) Move import duckdb into the function that uses it and raise a clear RuntimeError when invoked without duckdb:

      def open_conn(path):
          try:
              import duckdb
          except Exception:
              raise RuntimeError("duckdb is required for open_conn") from None
          return duckdb.connect(path)
      

Targets (first batch — high impact)

  • database.py — Pattern B (move DB imports into helpers; provide clear RuntimeError when called without duckdb). Tests import database.py so make this robust.
  • app.py — Pattern B (app modules often get imported during test runs; delay duckdb until handlers that need it).
  • pipeline/svd_pipeline.py — Pattern A (guard top-level import; pipeline code is heavy and module-level guard is fine).

Expanded target list (subsequent batches)

  • pipeline/text_pipeline.py, pipeline/fusion.py — Pattern A
  • pipeline/extract_mp_votes.py — Pattern B
  • similarity/compute.py, summarizer.py — Pattern A or B after inspection
  • scripts under scripts/ only if tests import them (prefer moving to main())

Step-by-step rollout

  1. Prepare patches for batch 1 (three files). Create one patch file per file so changes are atomic and easy to revert.
  2. Apply edits in a feature branch or local commit. Run focused tests:
    • pytest tests/test_database_audit.py -q
    • pytest tests/test_political_compass.py::test_* -q (if applicable)
  3. If focused tests pass, run full suite in .venv:
    • .venv/bin/python -m pytest tests/ -q
  4. If failures occur, inspect tracebacks for missing duckdb at runtime and either revert the specific change or convert Pattern A ↔ Pattern B as needed.
  5. Repeat for next batches until all targeted modules are covered.

Verification

  • After each file change, run the focused tests that touch the module. After the batch, run full test suite (local .venv recommended):
    • .venv/bin/python -m pytest tests/ -q — expect no new failures.
  • Confirm importability in empty environment (simulate by temporarily renaming .venv or running in environment without duckdb):
    • python -c "import analysis; print('ok')" — should not raise ImportError for guarded modules.

Rollback strategy

  • Make one file change per commit. If tests fail, revert the last commit and open an issue with the failure trace.

Patch previews (examples)

  • Pattern A (top-level guard):

    • replace import duckdb with:
      try:
          import duckdb
      except Exception:  # pragma: no cover
          duckdb = None  # type: ignore
      
  • Pattern B (function-local import + clear error):

    • before:
      import duckdb
      
      def open_conn(path):
          return duckdb.connect(path)
      
    • after:
      def open_conn(path):
          try:
              import duckdb
          except Exception:
              raise RuntimeError("duckdb is required for open_conn") from None
          return duckdb.connect(path)
      

Risks & mitigations

  • Risk: hiding missing dependency until runtime. Mitigation: when using Pattern B raise descriptive RuntimeError at call site so failures are explicit.
  • Risk: tests that intentionally require duckdb may break if we change behavior incorrectly. Mitigation: run focused tests that import duckdb intentionally and keep those files unchanged.

Owner & next actions for me

  • I can generate the exact patch diffs for batch 1 (three files) and present them for review before applying. This is recommended to keep the change small and reviewable.
  • Reply with:
    • prepare — I will create the patch diffs for database.py, app.py, pipeline/svd_pipeline.py and show them to you (no files modified yet), or
    • apply — I will apply the first batch now and run focused tests locally.

Notes

  • I already applied import-guards in several analysis/ modules (trajectory, explorer_data, clustering, political_axis) during earlier review; this plan continues that conservative approach.

References

  • Examples: analysis/explorer_data.py, analysis/trajectory.py, analysis/visualize.py