8.3 KiB

Raw Blame History

title	type	status	date
refactor: Replace print() calls with structured logging	refactor	active	2026-04-24

Replace print() with Structured Logging

Overview

There are approximately 225 print() calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.

Problem Frame

print() outputs are invisible in production logs or mixed with Streamlit UI
No log levels (INFO, WARNING, ERROR) to filter noise
No module names to identify which component logged what
Ingestion and API errors are silently swallowed by broad except blocks
Scripts produce unstructured output that is hard to parse or aggregate

Requirements Trace

R1. Replace all print() calls with appropriate logging levels
R2. Configure a project-wide logger with module-level naming
R3. Preserve existing output behavior in Streamlit contexts (use st.info/st.warning where appropriate)
R4. Update CODE_STYLE.md to mandate logging over print
R5. All tests pass after migration

Scope Boundaries

Included:

database.py, api_client.py, summarizer.py, ai_provider.py
pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
scripts/ (batch migration, one script at a time)

Excluded:

explorer.py Streamlit UI prints (these may be intentional UI feedback)
app.py user-facing prints
Third-party code

Key Technical Decisions

Use standard library logging — no external dependency needed. If structlog is desired later, it wraps logging.
Module-level loggers — logger = logging.getLogger(__name__) pattern
Root config in config.py — basicConfig or dictConfig at app startup
Streamlit compatibility — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with st.* calls

Context & Research

Relevant Code and Patterns

database.py — prints in insert/update paths, ~50+ prints
api_client.py — prints in fetch/pagination logic
scripts/ — 22 scripts, many with progress prints
CODE_STYLE.md — already recommends structured logging

Institutional Learnings

docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md — mechanical changes should be verified with full test suite

Implementation Units

U1. Set up logging configuration and test harness

Goal: Create the logging infrastructure and tests before touching any print statements.

Requirements: R2

Dependencies: None

Files:

Modify: config.py
Create: tests/test_logging_config.py

Approach:

Add configure_logging(level=logging.INFO) to config.py
Use standard format: %(asctime)s - %(name)s - %(levelname)s - %(message)s
Create test that verifies logger hierarchy and formatting

Execution note: Test-first — write test_logging_config.py before any implementation.

Test scenarios:

Happy path: configure_logging() sets up root logger with correct format
Happy path: Module logger logging.getLogger("database") inherits level
Edge case: Calling configure_logging twice is idempotent

Verification:

uv run pytest tests/test_logging_config.py -v passes

U2. Migrate database.py prints to logging

Goal: Replace all print() calls in database.py with logger calls.

Requirements: R1, R5

Dependencies: U1

Files:

Modify: database.py
Modify: tests/test_database_audit.py (if it checks output)

Approach:

Add logger = logging.getLogger(__name__) at module level
Replace progress prints with logger.info()
Replace error/warning prints with logger.warning() / logger.error()
Keep behavior identical (same messages)

Execution note: Test-first — write a test that asserts caplog captures a database log message before changing any code.

Test scenarios:

Happy path: caplog captures logger.info during motion insert
Error path: caplog captures logger.error on DB failure
Edge case: No prints leak to stdout (use capsys to verify)

Verification:

grep -n "print(" database.py returns nothing (or only intentional UI prints)
uv run pytest tests/test_database_audit.py -v passes

U3. Migrate api_client.py prints to logging

Goal: Replace all print() calls in api_client.py with logger calls.

Requirements: R1, R5

Dependencies: U1

Files:

Modify: api_client.py
Modify: tests/test_api_client.py (create if missing)

Approach:

Same pattern as U2: module logger, map prints to levels
API pagination progress → logger.info
Rate limit / retry messages → logger.warning

Execution note: Test-first — characterize current behavior with a capsys test, then migrate.

Test scenarios:

Happy path: API fetch logs pagination progress at INFO level
Error path: Failed request logs at ERROR level
Integration: Log output includes module name (api_client)

Verification:

grep -n "print(" api_client.py returns nothing
Existing API tests pass

U4. Migrate pipeline modules

Goal: Replace prints in pipeline/ with logging.

Requirements: R1, R5

Dependencies: U1, U2 (for database.py patterns to follow)

Files:

Modify: pipeline/run_pipeline.py, pipeline/svd_pipeline.py, pipeline/text_pipeline.py, pipeline/fusion.py

Approach:

Batch migration of 4 files
Progress bars / step completion → logger.info
Warnings about missing data → logger.warning

Test scenarios:

Happy path: Pipeline run emits structured logs for each stage
Error path: Missing embeddings logged at WARNING, not silently skipped

Verification:

grep -rn "print(" pipeline/ returns nothing
Pipeline tests pass

U5. Migrate scripts/ batch

Goal: Replace prints in scripts/ with logging.

Requirements: R1, R5

Dependencies: U1

Files:

Modify: scripts/*.py (batch, mechanical)

Approach:

Script-level loggers: logger = logging.getLogger("scripts.drift_analysis")
CLI progress prints → logger.info
Results summary prints → logger.info (or keep as print if they are actual CLI output)

Execution note: Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep print(json.dumps(result)) style outputs.

Test scenarios:

Happy path: Script progress is logged, result output is preserved
Edge case: Scripts that parse their own output still work

Verification:

Scripts that produce machine-readable output still do so
uv run pytest tests/scripts/ -q passes

U6. Update CODE_STYLE.md and add lint rule

Goal: Prevent new print() calls from being introduced.

Requirements: R4

Dependencies: U1–U5

Files:

Modify: CODE_STYLE.md
Modify: .pre-commit-config.yaml (add ruff rule for print)

Approach:

Add "Use logging, not print" section to CODE_STYLE.md
Add ruff rule: T201 (print found) to enforce

Test expectation: none — documentation and config change.

Verification:

ruff check . fails if any new print() is added

System-Wide Impact

Interaction graph: All modules that previously printed to stdout now use logging handlers
Error propagation: Logging does not change exception flow, but error messages are now timestamped and leveled
State lifecycle risks: None — logging is side-effect-only
Unchanged invariants: All existing behavior preserved; only output channel changes

Risks & Dependencies

Risk	Mitigation
Missing a print() call	Use `grep -rn "print(" --include="*.py"` as final check
Streamlit UI breaks from missing prints	Identify and convert intentional UI prints to `st.info` first
Tests that assert on stdout break	Update to use `caplog` fixture
Scripts that pipe their own output	Keep result prints; only migrate diagnostic prints

Documentation / Operational Notes

Update CODE_STYLE.md logging section
Consider adding a logging configuration section to ARCHITECTURE.md

Sources & References

CODE_STYLE.md logging guidance
Python logging docs: https://docs.python.org/3/library/logging.html
Existing prints: grep -rn "print(" --include="*.py" .

8.3 KiB Raw Blame History

Replace print() with Structured Logging

Overview

Problem Frame

Requirements Trace

Scope Boundaries

Key Technical Decisions

Context & Research

Relevant Code and Patterns

Institutional Learnings

Implementation Units

System-Wide Impact

Risks & Dependencies

Documentation / Operational Notes

Sources & References

8.3 KiB

Raw Blame History