8.3 KiB
| title | type | status | date |
|---|---|---|---|
| refactor: Replace print() calls with structured logging | refactor | active | 2026-04-24 |
Replace print() with Structured Logging
Overview
There are approximately 225 print() calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.
Problem Frame
print()outputs are invisible in production logs or mixed with Streamlit UI- No log levels (INFO, WARNING, ERROR) to filter noise
- No module names to identify which component logged what
- Ingestion and API errors are silently swallowed by broad except blocks
- Scripts produce unstructured output that is hard to parse or aggregate
Requirements Trace
- R1. Replace all
print()calls with appropriatelogginglevels - R2. Configure a project-wide logger with module-level naming
- R3. Preserve existing output behavior in Streamlit contexts (use
st.info/st.warningwhere appropriate) - R4. Update CODE_STYLE.md to mandate logging over print
- R5. All tests pass after migration
Scope Boundaries
Included:
- database.py, api_client.py, summarizer.py, ai_provider.py
- pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
- scripts/ (batch migration, one script at a time)
Excluded:
- explorer.py Streamlit UI prints (these may be intentional UI feedback)
- app.py user-facing prints
- Third-party code
Key Technical Decisions
- Use standard library
logging— no external dependency needed. If structlog is desired later, it wraps logging. - Module-level loggers —
logger = logging.getLogger(__name__)pattern - Root config in config.py — basicConfig or dictConfig at app startup
- Streamlit compatibility — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with
st.*calls
Context & Research
Relevant Code and Patterns
database.py— prints in insert/update paths, ~50+ printsapi_client.py— prints in fetch/pagination logicscripts/— 22 scripts, many with progress printsCODE_STYLE.md— already recommends structured logging
Institutional Learnings
docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md— mechanical changes should be verified with full test suite
Implementation Units
- U1. Set up logging configuration and test harness
Goal: Create the logging infrastructure and tests before touching any print statements.
Requirements: R2
Dependencies: None
Files:
- Modify:
config.py - Create:
tests/test_logging_config.py
Approach:
- Add
configure_logging(level=logging.INFO)to config.py - Use standard format:
%(asctime)s - %(name)s - %(levelname)s - %(message)s - Create test that verifies logger hierarchy and formatting
Execution note: Test-first — write test_logging_config.py before any implementation.
Test scenarios:
- Happy path:
configure_logging()sets up root logger with correct format - Happy path: Module logger
logging.getLogger("database")inherits level - Edge case: Calling configure_logging twice is idempotent
Verification:
uv run pytest tests/test_logging_config.py -vpasses
- U2. Migrate database.py prints to logging
Goal: Replace all print() calls in database.py with logger calls.
Requirements: R1, R5
Dependencies: U1
Files:
- Modify:
database.py - Modify:
tests/test_database_audit.py(if it checks output)
Approach:
- Add
logger = logging.getLogger(__name__)at module level - Replace progress prints with
logger.info() - Replace error/warning prints with
logger.warning()/logger.error() - Keep behavior identical (same messages)
Execution note: Test-first — write a test that asserts caplog captures a database log message before changing any code.
Test scenarios:
- Happy path:
caplogcaptureslogger.infoduring motion insert - Error path:
caplogcaptureslogger.erroron DB failure - Edge case: No prints leak to stdout (use capsys to verify)
Verification:
grep -n "print(" database.pyreturns nothing (or only intentional UI prints)uv run pytest tests/test_database_audit.py -vpasses
- U3. Migrate api_client.py prints to logging
Goal: Replace all print() calls in api_client.py with logger calls.
Requirements: R1, R5
Dependencies: U1
Files:
- Modify:
api_client.py - Modify:
tests/test_api_client.py(create if missing)
Approach:
- Same pattern as U2: module logger, map prints to levels
- API pagination progress →
logger.info - Rate limit / retry messages →
logger.warning
Execution note: Test-first — characterize current behavior with a capsys test, then migrate.
Test scenarios:
- Happy path: API fetch logs pagination progress at INFO level
- Error path: Failed request logs at ERROR level
- Integration: Log output includes module name (
api_client)
Verification:
grep -n "print(" api_client.pyreturns nothing- Existing API tests pass
- U4. Migrate pipeline modules
Goal: Replace prints in pipeline/ with logging.
Requirements: R1, R5
Dependencies: U1, U2 (for database.py patterns to follow)
Files:
- Modify:
pipeline/run_pipeline.py,pipeline/svd_pipeline.py,pipeline/text_pipeline.py,pipeline/fusion.py
Approach:
- Batch migration of 4 files
- Progress bars / step completion →
logger.info - Warnings about missing data →
logger.warning
Test scenarios:
- Happy path: Pipeline run emits structured logs for each stage
- Error path: Missing embeddings logged at WARNING, not silently skipped
Verification:
grep -rn "print(" pipeline/returns nothing- Pipeline tests pass
- U5. Migrate scripts/ batch
Goal: Replace prints in scripts/ with logging.
Requirements: R1, R5
Dependencies: U1
Files:
- Modify:
scripts/*.py(batch, mechanical)
Approach:
- Script-level loggers:
logger = logging.getLogger("scripts.drift_analysis") - CLI progress prints →
logger.info - Results summary prints →
logger.info(or keep as print if they are actual CLI output)
Execution note: Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep print(json.dumps(result)) style outputs.
Test scenarios:
- Happy path: Script progress is logged, result output is preserved
- Edge case: Scripts that parse their own output still work
Verification:
- Scripts that produce machine-readable output still do so
uv run pytest tests/scripts/ -qpasses
- U6. Update CODE_STYLE.md and add lint rule
Goal: Prevent new print() calls from being introduced.
Requirements: R4
Dependencies: U1–U5
Files:
- Modify:
CODE_STYLE.md - Modify:
.pre-commit-config.yaml(add ruff rule for print)
Approach:
- Add "Use logging, not print" section to CODE_STYLE.md
- Add ruff rule:
T201(print found) to enforce
Test expectation: none — documentation and config change.
Verification:
ruff check .fails if any new print() is added
System-Wide Impact
- Interaction graph: All modules that previously printed to stdout now use logging handlers
- Error propagation: Logging does not change exception flow, but error messages are now timestamped and leveled
- State lifecycle risks: None — logging is side-effect-only
- Unchanged invariants: All existing behavior preserved; only output channel changes
Risks & Dependencies
| Risk | Mitigation |
|---|---|
| Missing a print() call | Use grep -rn "print(" --include="*.py" as final check |
| Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to st.info first |
| Tests that assert on stdout break | Update to use caplog fixture |
| Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints |
Documentation / Operational Notes
- Update CODE_STYLE.md logging section
- Consider adding a logging configuration section to ARCHITECTURE.md
Sources & References
- CODE_STYLE.md logging guidance
- Python logging docs: https://docs.python.org/3/library/logging.html
- Existing prints:
grep -rn "print(" --include="*.py" .