motief/docs/plans/2026-04-24-007-replace-prin...

---
title: "refactor: Replace print() calls with structured logging"
type: refactor
status: active
date: 2026-04-24
---

# Replace print() with Structured Logging

## Overview

There are approximately 225 `print()` calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.

## Problem Frame

- `print()` outputs are invisible in production logs or mixed with Streamlit UI
- No log levels (INFO, WARNING, ERROR) to filter noise
- No module names to identify which component logged what
- Ingestion and API errors are silently swallowed by broad except blocks
- Scripts produce unstructured output that is hard to parse or aggregate

## Requirements Trace

- R1. Replace all `print()` calls with appropriate `logging` levels
- R2. Configure a project-wide logger with module-level naming
- R3. Preserve existing output behavior in Streamlit contexts (use `st.info`/`st.warning` where appropriate)
- R4. Update CODE_STYLE.md to mandate logging over print
- R5. All tests pass after migration

## Scope Boundaries

**Included:**
- database.py, api_client.py, summarizer.py, ai_provider.py
- pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
- scripts/ (batch migration, one script at a time)

**Excluded:**
- explorer.py Streamlit UI prints (these may be intentional UI feedback)
- app.py user-facing prints
- Third-party code

## Key Technical Decisions

- **Use standard library `logging`** — no external dependency needed. If structlog is desired later, it wraps logging.
- **Module-level loggers** — `logger = logging.getLogger(__name__)` pattern
- **Root config in config.py** — basicConfig or dictConfig at app startup
- **Streamlit compatibility** — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with `st.*` calls

## Context & Research

### Relevant Code and Patterns

- `database.py` — prints in insert/update paths, ~50+ prints
- `api_client.py` — prints in fetch/pagination logic
- `scripts/` — 22 scripts, many with progress prints
- `CODE_STYLE.md` — already recommends structured logging

### Institutional Learnings

- `docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md` — mechanical changes should be verified with full test suite

## Implementation Units

- [ ] U1. **Set up logging configuration and test harness**

**Goal:** Create the logging infrastructure and tests before touching any print statements.

**Requirements:** R2

**Dependencies:** None

**Files:**
- Modify: `config.py`
- Create: `tests/test_logging_config.py`

**Approach:**
- Add `configure_logging(level=logging.INFO)` to config.py
- Use standard format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
- Create test that verifies logger hierarchy and formatting

**Execution note:** Test-first — write `test_logging_config.py` before any implementation.

**Test scenarios:**
- Happy path: `configure_logging()` sets up root logger with correct format
- Happy path: Module logger `logging.getLogger("database")` inherits level
- Edge case: Calling configure_logging twice is idempotent

**Verification:**
- `uv run pytest tests/test_logging_config.py -v` passes

---

- [ ] U2. **Migrate database.py prints to logging**

**Goal:** Replace all print() calls in database.py with logger calls.

**Requirements:** R1, R5

**Dependencies:** U1

**Files:**
- Modify: `database.py`
- Modify: `tests/test_database_audit.py` (if it checks output)

**Approach:**
- Add `logger = logging.getLogger(__name__)` at module level
- Replace progress prints with `logger.info()`
- Replace error/warning prints with `logger.warning()` / `logger.error()`
- Keep behavior identical (same messages)

**Execution note:** Test-first — write a test that asserts `caplog` captures a database log message before changing any code.

**Test scenarios:**
- Happy path: `caplog` captures `logger.info` during motion insert
- Error path: `caplog` captures `logger.error` on DB failure
- Edge case: No prints leak to stdout (use capsys to verify)

**Verification:**
- `grep -n "print(" database.py` returns nothing (or only intentional UI prints)
- `uv run pytest tests/test_database_audit.py -v` passes

---

- [ ] U3. **Migrate api_client.py prints to logging**

**Goal:** Replace all print() calls in api_client.py with logger calls.

**Requirements:** R1, R5

**Dependencies:** U1

**Files:**
- Modify: `api_client.py`
- Modify: `tests/test_api_client.py` (create if missing)

**Approach:**
- Same pattern as U2: module logger, map prints to levels
- API pagination progress → `logger.info`
- Rate limit / retry messages → `logger.warning`

**Execution note:** Test-first — characterize current behavior with a capsys test, then migrate.

**Test scenarios:**
- Happy path: API fetch logs pagination progress at INFO level
- Error path: Failed request logs at ERROR level
- Integration: Log output includes module name (`api_client`)

**Verification:**
- `grep -n "print(" api_client.py` returns nothing
- Existing API tests pass

---

- [ ] U4. **Migrate pipeline modules**

**Goal:** Replace prints in pipeline/ with logging.

**Requirements:** R1, R5

**Dependencies:** U1, U2 (for database.py patterns to follow)

**Files:**
- Modify: `pipeline/run_pipeline.py`, `pipeline/svd_pipeline.py`, `pipeline/text_pipeline.py`, `pipeline/fusion.py`

**Approach:**
- Batch migration of 4 files
- Progress bars / step completion → `logger.info`
- Warnings about missing data → `logger.warning`

**Test scenarios:**
- Happy path: Pipeline run emits structured logs for each stage
- Error path: Missing embeddings logged at WARNING, not silently skipped

**Verification:**
- `grep -rn "print(" pipeline/` returns nothing
- Pipeline tests pass

---

- [ ] U5. **Migrate scripts/ batch**

**Goal:** Replace prints in scripts/ with logging.

**Requirements:** R1, R5

**Dependencies:** U1

**Files:**
- Modify: `scripts/*.py` (batch, mechanical)

**Approach:**
- Script-level loggers: `logger = logging.getLogger("scripts.drift_analysis")`
- CLI progress prints → `logger.info`
- Results summary prints → `logger.info` (or keep as print if they are actual CLI output)

**Execution note:** Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep `print(json.dumps(result))` style outputs.

**Test scenarios:**
- Happy path: Script progress is logged, result output is preserved
- Edge case: Scripts that parse their own output still work

**Verification:**
- Scripts that produce machine-readable output still do so
- `uv run pytest tests/scripts/ -q` passes

---

- [ ] U6. **Update CODE_STYLE.md and add lint rule**

**Goal:** Prevent new print() calls from being introduced.

**Requirements:** R4

**Dependencies:** U1–U5

**Files:**
- Modify: `CODE_STYLE.md`
- Modify: `.pre-commit-config.yaml` (add ruff rule for print)

**Approach:**
- Add "Use logging, not print" section to CODE_STYLE.md
- Add ruff rule: `T201` (print found) to enforce

**Test expectation:** none — documentation and config change.

**Verification:**
- `ruff check .` fails if any new print() is added

---

## System-Wide Impact

- **Interaction graph:** All modules that previously printed to stdout now use logging handlers
- **Error propagation:** Logging does not change exception flow, but error messages are now timestamped and leveled
- **State lifecycle risks:** None — logging is side-effect-only
- **Unchanged invariants:** All existing behavior preserved; only output channel changes

## Risks & Dependencies

| Risk | Mitigation |
|------|------------|
| Missing a print() call | Use `grep -rn "print(" --include="*.py"` as final check |
| Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to `st.info` first |
| Tests that assert on stdout break | Update to use `caplog` fixture |
| Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints |

## Documentation / Operational Notes

- Update CODE_STYLE.md logging section
- Consider adding a logging configuration section to ARCHITECTURE.md

## Sources & References

- CODE_STYLE.md logging guidance
- Python logging docs: https://docs.python.org/3/library/logging.html
- Existing prints: `grep -rn "print(" --include="*.py" .`