You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-04-24-007-replace-prin...

8.3 KiB

title type status date
refactor: Replace print() calls with structured logging refactor active 2026-04-24

Replace print() with Structured Logging

Overview

There are approximately 225 print() calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.

Problem Frame

  • print() outputs are invisible in production logs or mixed with Streamlit UI
  • No log levels (INFO, WARNING, ERROR) to filter noise
  • No module names to identify which component logged what
  • Ingestion and API errors are silently swallowed by broad except blocks
  • Scripts produce unstructured output that is hard to parse or aggregate

Requirements Trace

  • R1. Replace all print() calls with appropriate logging levels
  • R2. Configure a project-wide logger with module-level naming
  • R3. Preserve existing output behavior in Streamlit contexts (use st.info/st.warning where appropriate)
  • R4. Update CODE_STYLE.md to mandate logging over print
  • R5. All tests pass after migration

Scope Boundaries

Included:

  • database.py, api_client.py, summarizer.py, ai_provider.py
  • pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
  • scripts/ (batch migration, one script at a time)

Excluded:

  • explorer.py Streamlit UI prints (these may be intentional UI feedback)
  • app.py user-facing prints
  • Third-party code

Key Technical Decisions

  • Use standard library logging — no external dependency needed. If structlog is desired later, it wraps logging.
  • Module-level loggerslogger = logging.getLogger(__name__) pattern
  • Root config in config.py — basicConfig or dictConfig at app startup
  • Streamlit compatibility — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with st.* calls

Context & Research

Relevant Code and Patterns

  • database.py — prints in insert/update paths, ~50+ prints
  • api_client.py — prints in fetch/pagination logic
  • scripts/ — 22 scripts, many with progress prints
  • CODE_STYLE.md — already recommends structured logging

Institutional Learnings

  • docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md — mechanical changes should be verified with full test suite

Implementation Units

  • U1. Set up logging configuration and test harness

Goal: Create the logging infrastructure and tests before touching any print statements.

Requirements: R2

Dependencies: None

Files:

  • Modify: config.py
  • Create: tests/test_logging_config.py

Approach:

  • Add configure_logging(level=logging.INFO) to config.py
  • Use standard format: %(asctime)s - %(name)s - %(levelname)s - %(message)s
  • Create test that verifies logger hierarchy and formatting

Execution note: Test-first — write test_logging_config.py before any implementation.

Test scenarios:

  • Happy path: configure_logging() sets up root logger with correct format
  • Happy path: Module logger logging.getLogger("database") inherits level
  • Edge case: Calling configure_logging twice is idempotent

Verification:

  • uv run pytest tests/test_logging_config.py -v passes

  • U2. Migrate database.py prints to logging

Goal: Replace all print() calls in database.py with logger calls.

Requirements: R1, R5

Dependencies: U1

Files:

  • Modify: database.py
  • Modify: tests/test_database_audit.py (if it checks output)

Approach:

  • Add logger = logging.getLogger(__name__) at module level
  • Replace progress prints with logger.info()
  • Replace error/warning prints with logger.warning() / logger.error()
  • Keep behavior identical (same messages)

Execution note: Test-first — write a test that asserts caplog captures a database log message before changing any code.

Test scenarios:

  • Happy path: caplog captures logger.info during motion insert
  • Error path: caplog captures logger.error on DB failure
  • Edge case: No prints leak to stdout (use capsys to verify)

Verification:

  • grep -n "print(" database.py returns nothing (or only intentional UI prints)
  • uv run pytest tests/test_database_audit.py -v passes

  • U3. Migrate api_client.py prints to logging

Goal: Replace all print() calls in api_client.py with logger calls.

Requirements: R1, R5

Dependencies: U1

Files:

  • Modify: api_client.py
  • Modify: tests/test_api_client.py (create if missing)

Approach:

  • Same pattern as U2: module logger, map prints to levels
  • API pagination progress → logger.info
  • Rate limit / retry messages → logger.warning

Execution note: Test-first — characterize current behavior with a capsys test, then migrate.

Test scenarios:

  • Happy path: API fetch logs pagination progress at INFO level
  • Error path: Failed request logs at ERROR level
  • Integration: Log output includes module name (api_client)

Verification:

  • grep -n "print(" api_client.py returns nothing
  • Existing API tests pass

  • U4. Migrate pipeline modules

Goal: Replace prints in pipeline/ with logging.

Requirements: R1, R5

Dependencies: U1, U2 (for database.py patterns to follow)

Files:

  • Modify: pipeline/run_pipeline.py, pipeline/svd_pipeline.py, pipeline/text_pipeline.py, pipeline/fusion.py

Approach:

  • Batch migration of 4 files
  • Progress bars / step completion → logger.info
  • Warnings about missing data → logger.warning

Test scenarios:

  • Happy path: Pipeline run emits structured logs for each stage
  • Error path: Missing embeddings logged at WARNING, not silently skipped

Verification:

  • grep -rn "print(" pipeline/ returns nothing
  • Pipeline tests pass

  • U5. Migrate scripts/ batch

Goal: Replace prints in scripts/ with logging.

Requirements: R1, R5

Dependencies: U1

Files:

  • Modify: scripts/*.py (batch, mechanical)

Approach:

  • Script-level loggers: logger = logging.getLogger("scripts.drift_analysis")
  • CLI progress prints → logger.info
  • Results summary prints → logger.info (or keep as print if they are actual CLI output)

Execution note: Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep print(json.dumps(result)) style outputs.

Test scenarios:

  • Happy path: Script progress is logged, result output is preserved
  • Edge case: Scripts that parse their own output still work

Verification:

  • Scripts that produce machine-readable output still do so
  • uv run pytest tests/scripts/ -q passes

  • U6. Update CODE_STYLE.md and add lint rule

Goal: Prevent new print() calls from being introduced.

Requirements: R4

Dependencies: U1–U5

Files:

  • Modify: CODE_STYLE.md
  • Modify: .pre-commit-config.yaml (add ruff rule for print)

Approach:

  • Add "Use logging, not print" section to CODE_STYLE.md
  • Add ruff rule: T201 (print found) to enforce

Test expectation: none — documentation and config change.

Verification:

  • ruff check . fails if any new print() is added

System-Wide Impact

  • Interaction graph: All modules that previously printed to stdout now use logging handlers
  • Error propagation: Logging does not change exception flow, but error messages are now timestamped and leveled
  • State lifecycle risks: None — logging is side-effect-only
  • Unchanged invariants: All existing behavior preserved; only output channel changes

Risks & Dependencies

Risk Mitigation
Missing a print() call Use grep -rn "print(" --include="*.py" as final check
Streamlit UI breaks from missing prints Identify and convert intentional UI prints to st.info first
Tests that assert on stdout break Update to use caplog fixture
Scripts that pipe their own output Keep result prints; only migrate diagnostic prints

Documentation / Operational Notes

  • Update CODE_STYLE.md logging section
  • Consider adding a logging configuration section to ARCHITECTURE.md

Sources & References