--- title: "refactor: Replace print() calls with structured logging" type: refactor status: active date: 2026-04-24 --- # Replace print() with Structured Logging ## Overview There are approximately 225 `print()` calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context. ## Problem Frame - `print()` outputs are invisible in production logs or mixed with Streamlit UI - No log levels (INFO, WARNING, ERROR) to filter noise - No module names to identify which component logged what - Ingestion and API errors are silently swallowed by broad except blocks - Scripts produce unstructured output that is hard to parse or aggregate ## Requirements Trace - R1. Replace all `print()` calls with appropriate `logging` levels - R2. Configure a project-wide logger with module-level naming - R3. Preserve existing output behavior in Streamlit contexts (use `st.info`/`st.warning` where appropriate) - R4. Update CODE_STYLE.md to mandate logging over print - R5. All tests pass after migration ## Scope Boundaries **Included:** - database.py, api_client.py, summarizer.py, ai_provider.py - pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py) - scripts/ (batch migration, one script at a time) **Excluded:** - explorer.py Streamlit UI prints (these may be intentional UI feedback) - app.py user-facing prints - Third-party code ## Key Technical Decisions - **Use standard library `logging`** — no external dependency needed. If structlog is desired later, it wraps logging. - **Module-level loggers** — `logger = logging.getLogger(__name__)` pattern - **Root config in config.py** — basicConfig or dictConfig at app startup - **Streamlit compatibility** — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with `st.*` calls ## Context & Research ### Relevant Code and Patterns - `database.py` — prints in insert/update paths, ~50+ prints - `api_client.py` — prints in fetch/pagination logic - `scripts/` — 22 scripts, many with progress prints - `CODE_STYLE.md` — already recommends structured logging ### Institutional Learnings - `docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md` — mechanical changes should be verified with full test suite ## Implementation Units - [ ] U1. **Set up logging configuration and test harness** **Goal:** Create the logging infrastructure and tests before touching any print statements. **Requirements:** R2 **Dependencies:** None **Files:** - Modify: `config.py` - Create: `tests/test_logging_config.py` **Approach:** - Add `configure_logging(level=logging.INFO)` to config.py - Use standard format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s` - Create test that verifies logger hierarchy and formatting **Execution note:** Test-first — write `test_logging_config.py` before any implementation. **Test scenarios:** - Happy path: `configure_logging()` sets up root logger with correct format - Happy path: Module logger `logging.getLogger("database")` inherits level - Edge case: Calling configure_logging twice is idempotent **Verification:** - `uv run pytest tests/test_logging_config.py -v` passes --- - [ ] U2. **Migrate database.py prints to logging** **Goal:** Replace all print() calls in database.py with logger calls. **Requirements:** R1, R5 **Dependencies:** U1 **Files:** - Modify: `database.py` - Modify: `tests/test_database_audit.py` (if it checks output) **Approach:** - Add `logger = logging.getLogger(__name__)` at module level - Replace progress prints with `logger.info()` - Replace error/warning prints with `logger.warning()` / `logger.error()` - Keep behavior identical (same messages) **Execution note:** Test-first — write a test that asserts `caplog` captures a database log message before changing any code. **Test scenarios:** - Happy path: `caplog` captures `logger.info` during motion insert - Error path: `caplog` captures `logger.error` on DB failure - Edge case: No prints leak to stdout (use capsys to verify) **Verification:** - `grep -n "print(" database.py` returns nothing (or only intentional UI prints) - `uv run pytest tests/test_database_audit.py -v` passes --- - [ ] U3. **Migrate api_client.py prints to logging** **Goal:** Replace all print() calls in api_client.py with logger calls. **Requirements:** R1, R5 **Dependencies:** U1 **Files:** - Modify: `api_client.py` - Modify: `tests/test_api_client.py` (create if missing) **Approach:** - Same pattern as U2: module logger, map prints to levels - API pagination progress → `logger.info` - Rate limit / retry messages → `logger.warning` **Execution note:** Test-first — characterize current behavior with a capsys test, then migrate. **Test scenarios:** - Happy path: API fetch logs pagination progress at INFO level - Error path: Failed request logs at ERROR level - Integration: Log output includes module name (`api_client`) **Verification:** - `grep -n "print(" api_client.py` returns nothing - Existing API tests pass --- - [ ] U4. **Migrate pipeline modules** **Goal:** Replace prints in pipeline/ with logging. **Requirements:** R1, R5 **Dependencies:** U1, U2 (for database.py patterns to follow) **Files:** - Modify: `pipeline/run_pipeline.py`, `pipeline/svd_pipeline.py`, `pipeline/text_pipeline.py`, `pipeline/fusion.py` **Approach:** - Batch migration of 4 files - Progress bars / step completion → `logger.info` - Warnings about missing data → `logger.warning` **Test scenarios:** - Happy path: Pipeline run emits structured logs for each stage - Error path: Missing embeddings logged at WARNING, not silently skipped **Verification:** - `grep -rn "print(" pipeline/` returns nothing - Pipeline tests pass --- - [ ] U5. **Migrate scripts/ batch** **Goal:** Replace prints in scripts/ with logging. **Requirements:** R1, R5 **Dependencies:** U1 **Files:** - Modify: `scripts/*.py` (batch, mechanical) **Approach:** - Script-level loggers: `logger = logging.getLogger("scripts.drift_analysis")` - CLI progress prints → `logger.info` - Results summary prints → `logger.info` (or keep as print if they are actual CLI output) **Execution note:** Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep `print(json.dumps(result))` style outputs. **Test scenarios:** - Happy path: Script progress is logged, result output is preserved - Edge case: Scripts that parse their own output still work **Verification:** - Scripts that produce machine-readable output still do so - `uv run pytest tests/scripts/ -q` passes --- - [ ] U6. **Update CODE_STYLE.md and add lint rule** **Goal:** Prevent new print() calls from being introduced. **Requirements:** R4 **Dependencies:** U1–U5 **Files:** - Modify: `CODE_STYLE.md` - Modify: `.pre-commit-config.yaml` (add ruff rule for print) **Approach:** - Add "Use logging, not print" section to CODE_STYLE.md - Add ruff rule: `T201` (print found) to enforce **Test expectation:** none — documentation and config change. **Verification:** - `ruff check .` fails if any new print() is added --- ## System-Wide Impact - **Interaction graph:** All modules that previously printed to stdout now use logging handlers - **Error propagation:** Logging does not change exception flow, but error messages are now timestamped and leveled - **State lifecycle risks:** None — logging is side-effect-only - **Unchanged invariants:** All existing behavior preserved; only output channel changes ## Risks & Dependencies | Risk | Mitigation | |------|------------| | Missing a print() call | Use `grep -rn "print(" --include="*.py"` as final check | | Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to `st.info` first | | Tests that assert on stdout break | Update to use `caplog` fixture | | Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints | ## Documentation / Operational Notes - Update CODE_STYLE.md logging section - Consider adding a logging configuration section to ARCHITECTURE.md ## Sources & References - CODE_STYLE.md logging guidance - Python logging docs: https://docs.python.org/3/library/logging.html - Existing prints: `grep -rn "print(" --include="*.py" .`