You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-04-24-007-replace-prin...

256 lines
8.3 KiB

---
title: "refactor: Replace print() calls with structured logging"
type: refactor
status: active
date: 2026-04-24
---
# Replace print() with Structured Logging
## Overview
There are approximately 225 `print()` calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.
## Problem Frame
- `print()` outputs are invisible in production logs or mixed with Streamlit UI
- No log levels (INFO, WARNING, ERROR) to filter noise
- No module names to identify which component logged what
- Ingestion and API errors are silently swallowed by broad except blocks
- Scripts produce unstructured output that is hard to parse or aggregate
## Requirements Trace
- R1. Replace all `print()` calls with appropriate `logging` levels
- R2. Configure a project-wide logger with module-level naming
- R3. Preserve existing output behavior in Streamlit contexts (use `st.info`/`st.warning` where appropriate)
- R4. Update CODE_STYLE.md to mandate logging over print
- R5. All tests pass after migration
## Scope Boundaries
**Included:**
- database.py, api_client.py, summarizer.py, ai_provider.py
- pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
- scripts/ (batch migration, one script at a time)
**Excluded:**
- explorer.py Streamlit UI prints (these may be intentional UI feedback)
- app.py user-facing prints
- Third-party code
## Key Technical Decisions
- **Use standard library `logging`** — no external dependency needed. If structlog is desired later, it wraps logging.
- **Module-level loggers** — `logger = logging.getLogger(__name__)` pattern
- **Root config in config.py** — basicConfig or dictConfig at app startup
- **Streamlit compatibility** — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with `st.*` calls
## Context & Research
### Relevant Code and Patterns
- `database.py` — prints in insert/update paths, ~50+ prints
- `api_client.py` — prints in fetch/pagination logic
- `scripts/` — 22 scripts, many with progress prints
- `CODE_STYLE.md` — already recommends structured logging
### Institutional Learnings
- `docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md` — mechanical changes should be verified with full test suite
## Implementation Units
- [ ] U1. **Set up logging configuration and test harness**
**Goal:** Create the logging infrastructure and tests before touching any print statements.
**Requirements:** R2
**Dependencies:** None
**Files:**
- Modify: `config.py`
- Create: `tests/test_logging_config.py`
**Approach:**
- Add `configure_logging(level=logging.INFO)` to config.py
- Use standard format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
- Create test that verifies logger hierarchy and formatting
**Execution note:** Test-first — write `test_logging_config.py` before any implementation.
**Test scenarios:**
- Happy path: `configure_logging()` sets up root logger with correct format
- Happy path: Module logger `logging.getLogger("database")` inherits level
- Edge case: Calling configure_logging twice is idempotent
**Verification:**
- `uv run pytest tests/test_logging_config.py -v` passes
---
- [ ] U2. **Migrate database.py prints to logging**
**Goal:** Replace all print() calls in database.py with logger calls.
**Requirements:** R1, R5
**Dependencies:** U1
**Files:**
- Modify: `database.py`
- Modify: `tests/test_database_audit.py` (if it checks output)
**Approach:**
- Add `logger = logging.getLogger(__name__)` at module level
- Replace progress prints with `logger.info()`
- Replace error/warning prints with `logger.warning()` / `logger.error()`
- Keep behavior identical (same messages)
**Execution note:** Test-first — write a test that asserts `caplog` captures a database log message before changing any code.
**Test scenarios:**
- Happy path: `caplog` captures `logger.info` during motion insert
- Error path: `caplog` captures `logger.error` on DB failure
- Edge case: No prints leak to stdout (use capsys to verify)
**Verification:**
- `grep -n "print(" database.py` returns nothing (or only intentional UI prints)
- `uv run pytest tests/test_database_audit.py -v` passes
---
- [ ] U3. **Migrate api_client.py prints to logging**
**Goal:** Replace all print() calls in api_client.py with logger calls.
**Requirements:** R1, R5
**Dependencies:** U1
**Files:**
- Modify: `api_client.py`
- Modify: `tests/test_api_client.py` (create if missing)
**Approach:**
- Same pattern as U2: module logger, map prints to levels
- API pagination progress → `logger.info`
- Rate limit / retry messages → `logger.warning`
**Execution note:** Test-first — characterize current behavior with a capsys test, then migrate.
**Test scenarios:**
- Happy path: API fetch logs pagination progress at INFO level
- Error path: Failed request logs at ERROR level
- Integration: Log output includes module name (`api_client`)
**Verification:**
- `grep -n "print(" api_client.py` returns nothing
- Existing API tests pass
---
- [ ] U4. **Migrate pipeline modules**
**Goal:** Replace prints in pipeline/ with logging.
**Requirements:** R1, R5
**Dependencies:** U1, U2 (for database.py patterns to follow)
**Files:**
- Modify: `pipeline/run_pipeline.py`, `pipeline/svd_pipeline.py`, `pipeline/text_pipeline.py`, `pipeline/fusion.py`
**Approach:**
- Batch migration of 4 files
- Progress bars / step completion → `logger.info`
- Warnings about missing data → `logger.warning`
**Test scenarios:**
- Happy path: Pipeline run emits structured logs for each stage
- Error path: Missing embeddings logged at WARNING, not silently skipped
**Verification:**
- `grep -rn "print(" pipeline/` returns nothing
- Pipeline tests pass
---
- [ ] U5. **Migrate scripts/ batch**
**Goal:** Replace prints in scripts/ with logging.
**Requirements:** R1, R5
**Dependencies:** U1
**Files:**
- Modify: `scripts/*.py` (batch, mechanical)
**Approach:**
- Script-level loggers: `logger = logging.getLogger("scripts.drift_analysis")`
- CLI progress prints → `logger.info`
- Results summary prints → `logger.info` (or keep as print if they are actual CLI output)
**Execution note:** Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep `print(json.dumps(result))` style outputs.
**Test scenarios:**
- Happy path: Script progress is logged, result output is preserved
- Edge case: Scripts that parse their own output still work
**Verification:**
- Scripts that produce machine-readable output still do so
- `uv run pytest tests/scripts/ -q` passes
---
- [ ] U6. **Update CODE_STYLE.md and add lint rule**
**Goal:** Prevent new print() calls from being introduced.
**Requirements:** R4
**Dependencies:** U1–U5
**Files:**
- Modify: `CODE_STYLE.md`
- Modify: `.pre-commit-config.yaml` (add ruff rule for print)
**Approach:**
- Add "Use logging, not print" section to CODE_STYLE.md
- Add ruff rule: `T201` (print found) to enforce
**Test expectation:** none — documentation and config change.
**Verification:**
- `ruff check .` fails if any new print() is added
---
## System-Wide Impact
- **Interaction graph:** All modules that previously printed to stdout now use logging handlers
- **Error propagation:** Logging does not change exception flow, but error messages are now timestamped and leveled
- **State lifecycle risks:** None — logging is side-effect-only
- **Unchanged invariants:** All existing behavior preserved; only output channel changes
## Risks & Dependencies
| Risk | Mitigation |
|------|------------|
| Missing a print() call | Use `grep -rn "print(" --include="*.py"` as final check |
| Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to `st.info` first |
| Tests that assert on stdout break | Update to use `caplog` fixture |
| Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints |
## Documentation / Operational Notes
- Update CODE_STYLE.md logging section
- Consider adding a logging configuration section to ARCHITECTURE.md
## Sources & References
- CODE_STYLE.md logging guidance
- Python logging docs: https://docs.python.org/3/library/logging.html
- Existing prints: `grep -rn "print(" --include="*.py" .`