You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
256 lines
8.3 KiB
256 lines
8.3 KiB
---
|
|
title: "refactor: Replace print() calls with structured logging"
|
|
type: refactor
|
|
status: active
|
|
date: 2026-04-24
|
|
---
|
|
|
|
# Replace print() with Structured Logging
|
|
|
|
## Overview
|
|
|
|
There are approximately 225 `print()` calls across the codebase (database.py, api_client.py, scripts/, pipeline/). CODE_STYLE.md already recommends structured logging, but it is not consistently applied. This makes production debugging difficult — no log levels, no timestamps, no module context.
|
|
|
|
## Problem Frame
|
|
|
|
- `print()` outputs are invisible in production logs or mixed with Streamlit UI
|
|
- No log levels (INFO, WARNING, ERROR) to filter noise
|
|
- No module names to identify which component logged what
|
|
- Ingestion and API errors are silently swallowed by broad except blocks
|
|
- Scripts produce unstructured output that is hard to parse or aggregate
|
|
|
|
## Requirements Trace
|
|
|
|
- R1. Replace all `print()` calls with appropriate `logging` levels
|
|
- R2. Configure a project-wide logger with module-level naming
|
|
- R3. Preserve existing output behavior in Streamlit contexts (use `st.info`/`st.warning` where appropriate)
|
|
- R4. Update CODE_STYLE.md to mandate logging over print
|
|
- R5. All tests pass after migration
|
|
|
|
## Scope Boundaries
|
|
|
|
**Included:**
|
|
- database.py, api_client.py, summarizer.py, ai_provider.py
|
|
- pipeline/ modules (run_pipeline.py, svd_pipeline.py, text_pipeline.py, fusion.py)
|
|
- scripts/ (batch migration, one script at a time)
|
|
|
|
**Excluded:**
|
|
- explorer.py Streamlit UI prints (these may be intentional UI feedback)
|
|
- app.py user-facing prints
|
|
- Third-party code
|
|
|
|
## Key Technical Decisions
|
|
|
|
- **Use standard library `logging`** — no external dependency needed. If structlog is desired later, it wraps logging.
|
|
- **Module-level loggers** — `logger = logging.getLogger(__name__)` pattern
|
|
- **Root config in config.py** — basicConfig or dictConfig at app startup
|
|
- **Streamlit compatibility** — In Streamlit contexts, logging to stderr still works; replace intentional UI prints with `st.*` calls
|
|
|
|
## Context & Research
|
|
|
|
### Relevant Code and Patterns
|
|
|
|
- `database.py` — prints in insert/update paths, ~50+ prints
|
|
- `api_client.py` — prints in fetch/pagination logic
|
|
- `scripts/` — 22 scripts, many with progress prints
|
|
- `CODE_STYLE.md` — already recommends structured logging
|
|
|
|
### Institutional Learnings
|
|
|
|
- `docs/solutions/best-practices/working-tree-hygiene-dependency-groups-and-gitignore-2026-04-24.md` — mechanical changes should be verified with full test suite
|
|
|
|
## Implementation Units
|
|
|
|
- [ ] U1. **Set up logging configuration and test harness**
|
|
|
|
**Goal:** Create the logging infrastructure and tests before touching any print statements.
|
|
|
|
**Requirements:** R2
|
|
|
|
**Dependencies:** None
|
|
|
|
**Files:**
|
|
- Modify: `config.py`
|
|
- Create: `tests/test_logging_config.py`
|
|
|
|
**Approach:**
|
|
- Add `configure_logging(level=logging.INFO)` to config.py
|
|
- Use standard format: `%(asctime)s - %(name)s - %(levelname)s - %(message)s`
|
|
- Create test that verifies logger hierarchy and formatting
|
|
|
|
**Execution note:** Test-first — write `test_logging_config.py` before any implementation.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: `configure_logging()` sets up root logger with correct format
|
|
- Happy path: Module logger `logging.getLogger("database")` inherits level
|
|
- Edge case: Calling configure_logging twice is idempotent
|
|
|
|
**Verification:**
|
|
- `uv run pytest tests/test_logging_config.py -v` passes
|
|
|
|
---
|
|
|
|
- [ ] U2. **Migrate database.py prints to logging**
|
|
|
|
**Goal:** Replace all print() calls in database.py with logger calls.
|
|
|
|
**Requirements:** R1, R5
|
|
|
|
**Dependencies:** U1
|
|
|
|
**Files:**
|
|
- Modify: `database.py`
|
|
- Modify: `tests/test_database_audit.py` (if it checks output)
|
|
|
|
**Approach:**
|
|
- Add `logger = logging.getLogger(__name__)` at module level
|
|
- Replace progress prints with `logger.info()`
|
|
- Replace error/warning prints with `logger.warning()` / `logger.error()`
|
|
- Keep behavior identical (same messages)
|
|
|
|
**Execution note:** Test-first — write a test that asserts `caplog` captures a database log message before changing any code.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: `caplog` captures `logger.info` during motion insert
|
|
- Error path: `caplog` captures `logger.error` on DB failure
|
|
- Edge case: No prints leak to stdout (use capsys to verify)
|
|
|
|
**Verification:**
|
|
- `grep -n "print(" database.py` returns nothing (or only intentional UI prints)
|
|
- `uv run pytest tests/test_database_audit.py -v` passes
|
|
|
|
---
|
|
|
|
- [ ] U3. **Migrate api_client.py prints to logging**
|
|
|
|
**Goal:** Replace all print() calls in api_client.py with logger calls.
|
|
|
|
**Requirements:** R1, R5
|
|
|
|
**Dependencies:** U1
|
|
|
|
**Files:**
|
|
- Modify: `api_client.py`
|
|
- Modify: `tests/test_api_client.py` (create if missing)
|
|
|
|
**Approach:**
|
|
- Same pattern as U2: module logger, map prints to levels
|
|
- API pagination progress → `logger.info`
|
|
- Rate limit / retry messages → `logger.warning`
|
|
|
|
**Execution note:** Test-first — characterize current behavior with a capsys test, then migrate.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: API fetch logs pagination progress at INFO level
|
|
- Error path: Failed request logs at ERROR level
|
|
- Integration: Log output includes module name (`api_client`)
|
|
|
|
**Verification:**
|
|
- `grep -n "print(" api_client.py` returns nothing
|
|
- Existing API tests pass
|
|
|
|
---
|
|
|
|
- [ ] U4. **Migrate pipeline modules**
|
|
|
|
**Goal:** Replace prints in pipeline/ with logging.
|
|
|
|
**Requirements:** R1, R5
|
|
|
|
**Dependencies:** U1, U2 (for database.py patterns to follow)
|
|
|
|
**Files:**
|
|
- Modify: `pipeline/run_pipeline.py`, `pipeline/svd_pipeline.py`, `pipeline/text_pipeline.py`, `pipeline/fusion.py`
|
|
|
|
**Approach:**
|
|
- Batch migration of 4 files
|
|
- Progress bars / step completion → `logger.info`
|
|
- Warnings about missing data → `logger.warning`
|
|
|
|
**Test scenarios:**
|
|
- Happy path: Pipeline run emits structured logs for each stage
|
|
- Error path: Missing embeddings logged at WARNING, not silently skipped
|
|
|
|
**Verification:**
|
|
- `grep -rn "print(" pipeline/` returns nothing
|
|
- Pipeline tests pass
|
|
|
|
---
|
|
|
|
- [ ] U5. **Migrate scripts/ batch**
|
|
|
|
**Goal:** Replace prints in scripts/ with logging.
|
|
|
|
**Requirements:** R1, R5
|
|
|
|
**Dependencies:** U1
|
|
|
|
**Files:**
|
|
- Modify: `scripts/*.py` (batch, mechanical)
|
|
|
|
**Approach:**
|
|
- Script-level loggers: `logger = logging.getLogger("scripts.drift_analysis")`
|
|
- CLI progress prints → `logger.info`
|
|
- Results summary prints → `logger.info` (or keep as print if they are actual CLI output)
|
|
|
|
**Execution note:** Some scripts may legitimately be CLI tools where stdout output is the product. Only migrate diagnostic/progress prints; keep `print(json.dumps(result))` style outputs.
|
|
|
|
**Test scenarios:**
|
|
- Happy path: Script progress is logged, result output is preserved
|
|
- Edge case: Scripts that parse their own output still work
|
|
|
|
**Verification:**
|
|
- Scripts that produce machine-readable output still do so
|
|
- `uv run pytest tests/scripts/ -q` passes
|
|
|
|
---
|
|
|
|
- [ ] U6. **Update CODE_STYLE.md and add lint rule**
|
|
|
|
**Goal:** Prevent new print() calls from being introduced.
|
|
|
|
**Requirements:** R4
|
|
|
|
**Dependencies:** U1–U5
|
|
|
|
**Files:**
|
|
- Modify: `CODE_STYLE.md`
|
|
- Modify: `.pre-commit-config.yaml` (add ruff rule for print)
|
|
|
|
**Approach:**
|
|
- Add "Use logging, not print" section to CODE_STYLE.md
|
|
- Add ruff rule: `T201` (print found) to enforce
|
|
|
|
**Test expectation:** none — documentation and config change.
|
|
|
|
**Verification:**
|
|
- `ruff check .` fails if any new print() is added
|
|
|
|
---
|
|
|
|
## System-Wide Impact
|
|
|
|
- **Interaction graph:** All modules that previously printed to stdout now use logging handlers
|
|
- **Error propagation:** Logging does not change exception flow, but error messages are now timestamped and leveled
|
|
- **State lifecycle risks:** None — logging is side-effect-only
|
|
- **Unchanged invariants:** All existing behavior preserved; only output channel changes
|
|
|
|
## Risks & Dependencies
|
|
|
|
| Risk | Mitigation |
|
|
|------|------------|
|
|
| Missing a print() call | Use `grep -rn "print(" --include="*.py"` as final check |
|
|
| Streamlit UI breaks from missing prints | Identify and convert intentional UI prints to `st.info` first |
|
|
| Tests that assert on stdout break | Update to use `caplog` fixture |
|
|
| Scripts that pipe their own output | Keep result prints; only migrate diagnostic prints |
|
|
|
|
## Documentation / Operational Notes
|
|
|
|
- Update CODE_STYLE.md logging section
|
|
- Consider adding a logging configuration section to ARCHITECTURE.md
|
|
|
|
## Sources & References
|
|
|
|
- CODE_STYLE.md logging guidance
|
|
- Python logging docs: https://docs.python.org/3/library/logging.html
|
|
- Existing prints: `grep -rn "print(" --include="*.py" .`
|
|
|