6.8 KiB
| date | topic | status |
|---|---|---|
| 2026-03-23 | Test refactor: replace monkeypatching with real implementations | validated |
Test Refactor — Remove Monkeypatching, Use Real Implementations
Problem Statement
The new test suite (added during the motion content enrichment hardening sprint) relies on:
sys.modulesinjection of a fakeduckdbmodulemonkeypatch.setattrto replace pipeline functions wholesale- Exception-swallowing smoke tests that verify nothing meaningful
These tests give false confidence. They patch away the very code paths we want to exercise. The goal is to replace them with tests that run real production code paths using in-memory DuckDB and injected fake callables.
Constraints
- Do NOT modify
app.pyorscheduler.py - Keep the module-level
db = MotionDatabase()singleton indatabase.py— do not remove it (production code depends on it) - Production-code changes must be minimal and backwards-compatible (all existing callers continue to work with default parameters)
- Tests run with
.venv/bin/python -m pytest -q - No
print()in library modules — uselogging.getLogger(__name__)
Approach: Dependency Injection + In-Memory DuckDB
Thread an optional db: MotionDatabase = None parameter through the three pipeline functions that currently hard-import the singleton. Pass MotionDatabase(":memory:") in tests. Replace fake ai_provider calls with a real FakeEmbedder callable injected as a parameter.
Rejected alternatives:
- Repository/Protocol pattern — major refactor of database.py and all callers; YAGNI for current test coverage needs
- tmp_path real DB files — slower, requires full schema migration in fixtures, still needs singleton patching
Architecture
Production Changes (minimal, backwards-compatible)
pipeline/ai_provider_wrapper.py
- Add
db: MotionDatabase = Noneparam toget_embeddings_with_retry - Add
embedder=Noneparam (callable replacingget_embeddings_batch) - If
None, fall back to imported singleton / real provider
pipeline/text_pipeline.py
ensure_text_embeddings: already acceptsdb_path; also acceptdb: MotionDatabase = Noneas override (takes precedence overdb_path)ensure_text_embeddings_for_ids: samedboverride param_select_text: acceptdb: MotionDatabase(already does); whendb.db_path == ":memory:"the function returns[](no motions seeded at that level) — this is fine for retry-path tests
similarity/compute.py
compute_similarities: already constructsMotionDatabase(db_path)locally; adddb: MotionDatabase = Noneoverride param
Test Infrastructure (tests/conftest.py)
New fixtures added alongside existing ones:
mem_db fixture (function-scoped)
- Creates
MotionDatabase(":memory:")— triggers_init_database()which creates all tables/sequences in-memory - Seeds a minimal set of motions rows for tests that need them
- Returned to test; closed after test
fake_embedder fixture
- Returns a
FakeEmbedderinstance — a real callable class with signature(texts, model=None, batch_size=50) -> list[list[float]] - Returns deterministic embeddings:
[[0.1] * 1536]per text - Accepts
fail_ids: list[int]kwarg to simulate per-item permanent failures (raisesRuntimeErrorfor those items)
Components
FakeEmbedder (tests/conftest.py)
- Real class, not a Mock
__call__(texts, model=None, batch_size=50)— returns list of float vectorsfail_idsset at construction time: any call where the batch index is infail_idsraisesRuntimeError- Tracks
call_countfor assertions
mem_db fixture
MotionDatabase(":memory:")— schema initialized by__init__- Optionally seeds motions via
db.insert_motion(...)for tests that need real data - No filesystem side effects
Data Flow
Retry test (replaces test_rerun_embeddings_retry.py)
mem_dbfixture creates in-memory DBFakeEmbedder(fail_ids=[101, 102])created- Call
ensure_text_embeddings(db=mem_db, embedder=fake_embedder) - Wrapper internally retries failed items
- Assert
fake_embedder.call_count > 1(retries happened) - Assert
failed_idsreturned contains[101, 102]
Audit event test (replaces test_database_audit.py)
mem_dbfixture — in-memory DB withaudit_eventstable- Call
mem_db.append_audit_event(None, "test_action", ...) - Query
audit_eventstable directly: assert row exists - No filesystem writes
Similarity filter test (replaces test_similarity_compute_filter.py)
mem_dbseeded with 2 motions with identical short titles ("Aangenomen.")- Embeddings seeded for both motions (identical vectors → score 1.0)
- Call
compute_similarities(vector_type="fused", window_id=None, db=mem_db) - Assert
store_similarity_batchstored 0 pairs (filtered out)
Embedding failure + audit event test (new, test_ai_provider_wrapper.py)
mem_dbfixtureFakeEmbedder(fail_ids=[5])— motion 5 permanently fails- Call
get_embeddings_with_retry(texts=[...], motion_ids=[5], db=mem_db, embedder=fake_embedder) - Assert return contains failure sentinel for motion 5
- Query
mem_dbaudit_events table: assert row with action="embedding_failed", target_id=5
Error Handling Strategy
_select_textwith in-memory DB returns[](no rows) — callers treat empty list as "no motions to embed", which is correctFakeEmbedder.fail_idsraisesRuntimeErrorto exercise the retry/backoff path without sleeping — the wrapper's retry loop catchesException, so this exercises the real retry path- Audit event writes go to the in-memory
audit_eventstable; the JSON ledger fallback path is tested separately if needed
Testing Strategy
All tests use .venv/bin/python -m pytest -q. No sys.modules tricks. No monkeypatch.setattr on production module attributes.
| Test file | What it exercises |
|---|---|
test_database_audit.py |
append_audit_event → in-memory DB row |
test_ai_provider_wrapper.py |
Empty input; transient retry; permanent fail + audit event |
test_rerun_embeddings_retry.py |
Real ensure_text_embeddings → retry path → ensure_text_embeddings_for_ids called |
test_similarity_compute_filter.py |
Real compute_similarities → filter removes perfect-score identical-title pairs |
Open Questions
- Does
MotionDatabase(":memory:").__init__successfully create theaudit_eventstable? (Needs verification — if_init_databaseconditionally skips it, the fixture needs to run it explicitly.) - Does
_select_textneed the DB connection passed directly rather than reconstructed fromdb.db_path? For:memory:DBs, a newduckdb.connect(":memory:")opens an empty DB, not the seeded one. Resolution:ensure_text_embeddingsshould pass the seeded connection rather than the path, OR accept that_select_textreturns[]for in-memory DBs (sufficient for retry-path tests).