You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
108 lines
4.9 KiB
108 lines
4.9 KiB
Title: Make modules import-safe (duckdb/plotly)
|
|
|
|
Why
|
|
- Enable lightweight unit tests and imports in environments without heavy runtime deps (duckdb, plotly) without changing runtime behaviour when those deps are present.
|
|
|
|
Scope
|
|
- Primary focus: library modules that are commonly imported by tests or other modules (not CLI scripts that are only executed).
|
|
- Initial rollout: small, reviewable batches. Do not push or change remote branches.
|
|
|
|
Non-goals
|
|
- Remove duckdb/plotly dependency from runtime environments.
|
|
- Refactor functionality beyond import-time safety.
|
|
|
|
Approach
|
|
- Two safe patterns (apply conservatively):
|
|
1) Pattern A — module-level guard
|
|
```py
|
|
try:
|
|
import duckdb
|
|
except Exception: # pragma: no cover
|
|
duckdb = None # type: ignore
|
|
```
|
|
Use when multiple functions in the module call duckdb and adding the guard is least invasive.
|
|
|
|
2) Pattern B — function-local import (preferred for DB helpers)
|
|
Move `import duckdb` into the function that uses it and raise a clear RuntimeError when invoked without duckdb:
|
|
```py
|
|
def open_conn(path):
|
|
try:
|
|
import duckdb
|
|
except Exception:
|
|
raise RuntimeError("duckdb is required for open_conn") from None
|
|
return duckdb.connect(path)
|
|
```
|
|
|
|
Targets (first batch — high impact)
|
|
- `database.py` — Pattern B (move DB imports into helpers; provide clear RuntimeError when called without duckdb). Tests import `database.py` so make this robust.
|
|
- `app.py` — Pattern B (app modules often get imported during test runs; delay duckdb until handlers that need it).
|
|
- `pipeline/svd_pipeline.py` — Pattern A (guard top-level import; pipeline code is heavy and module-level guard is fine).
|
|
|
|
Expanded target list (subsequent batches)
|
|
- `pipeline/text_pipeline.py`, `pipeline/fusion.py` — Pattern A
|
|
- `pipeline/extract_mp_votes.py` — Pattern B
|
|
- `similarity/compute.py`, `summarizer.py` — Pattern A or B after inspection
|
|
- scripts under `scripts/` only if tests import them (prefer moving to `main()`)
|
|
|
|
Step-by-step rollout
|
|
1) Prepare patches for batch 1 (three files). Create one patch file per file so changes are atomic and easy to revert.
|
|
2) Apply edits in a feature branch or local commit. Run focused tests:
|
|
- `pytest tests/test_database_audit.py -q`
|
|
- `pytest tests/test_political_compass.py::test_* -q` (if applicable)
|
|
3) If focused tests pass, run full suite in .venv:
|
|
- `.venv/bin/python -m pytest tests/ -q`
|
|
4) If failures occur, inspect tracebacks for missing duckdb at runtime and either revert the specific change or convert Pattern A ↔ Pattern B as needed.
|
|
5) Repeat for next batches until all targeted modules are covered.
|
|
|
|
Verification
|
|
- After each file change, run the focused tests that touch the module. After the batch, run full test suite (local `.venv` recommended):
|
|
- `.venv/bin/python -m pytest tests/ -q` — expect no new failures.
|
|
- Confirm importability in empty environment (simulate by temporarily renaming `.venv` or running in environment without duckdb):
|
|
- `python -c "import analysis; print('ok')"` — should not raise ImportError for guarded modules.
|
|
|
|
Rollback strategy
|
|
- Make one file change per commit. If tests fail, revert the last commit and open an issue with the failure trace.
|
|
|
|
Patch previews (examples)
|
|
- Pattern A (top-level guard):
|
|
- replace `import duckdb` with:
|
|
```py
|
|
try:
|
|
import duckdb
|
|
except Exception: # pragma: no cover
|
|
duckdb = None # type: ignore
|
|
```
|
|
|
|
- Pattern B (function-local import + clear error):
|
|
- before:
|
|
```py
|
|
import duckdb
|
|
|
|
def open_conn(path):
|
|
return duckdb.connect(path)
|
|
```
|
|
- after:
|
|
```py
|
|
def open_conn(path):
|
|
try:
|
|
import duckdb
|
|
except Exception:
|
|
raise RuntimeError("duckdb is required for open_conn") from None
|
|
return duckdb.connect(path)
|
|
```
|
|
|
|
Risks & mitigations
|
|
- Risk: hiding missing dependency until runtime. Mitigation: when using Pattern B raise descriptive RuntimeError at call site so failures are explicit.
|
|
- Risk: tests that intentionally require duckdb may break if we change behavior incorrectly. Mitigation: run focused tests that import duckdb intentionally and keep those files unchanged.
|
|
|
|
Owner & next actions for me
|
|
- I can generate the exact patch diffs for batch 1 (three files) and present them for review before applying. This is recommended to keep the change small and reviewable.
|
|
- Reply with:
|
|
- `prepare` — I will create the patch diffs for `database.py`, `app.py`, `pipeline/svd_pipeline.py` and show them to you (no files modified yet), or
|
|
- `apply` — I will apply the first batch now and run focused tests locally.
|
|
|
|
Notes
|
|
- I already applied import-guards in several `analysis/` modules (trajectory, explorer_data, clustering, political_axis) during earlier review; this plan continues that conservative approach.
|
|
|
|
References
|
|
- Examples: `analysis/explorer_data.py`, `analysis/trajectory.py`, `analysis/visualize.py`
|
|
|