# motion content enrichment — implementation plan Goal: Implement the prioritized incremental hardening from the design (2026-03-23) so the SyncFeed → embedding → fusion → similarity pipeline is more robust, observable, and testable. Break the work into small, independent micro-tasks (one file + its test per task) so many implementers can work in parallel. Design doc: thoughts/shared/designs/2026-03-23-motion-content-enrichment-next-steps-design.md Architecture summary (what I'll implement) - Add a small audit API on MotionDatabase so code can record per-item failures in a stable place (or fall back to a ledger file if DuckDB is not present). - Add a dedicated ai_provider retry/fallback wrapper that: - retries failed batches (exponential backoff), - on persistent failure retries missing items with smaller batch sizes, - returns aligned embedding results (None for failed items), - records persistent failures to audit_events (using MotionDatabase.append_audit_event). - Wire text embedding pipeline to use the wrapper and return failed ids (so rerun script can retry them). - Add a `--max-body-workers` CLI option to scripts/sync_motion_content.py, reduce default to 10 and add per-request retries. - Add `--retry-missing` to scripts/rerun_embeddings.py: rerun missing failed items with smaller batches. - Add a DB-side safety filter in similarity.compute to avoid inserting trivial 1.0 matches for very-short identical titles. - Add a small QA script scripts/qa_similarity.py that samples windows/motions and writes a short JSON ledger for manual review. - Add focused unit tests for the new behaviours (ai retry wrapper, DB audit append, sync body fetch retries, rerun retry mode, similarity filter, QA script). Decisions / gap filling (why these concrete choices) - Audit recording: implement MotionDatabase.append_audit_event that writes to audit_events table if present, else appends to thoughts/ledgers/audit_events.json. Rationale: migration SQL is a commented placeholder; making DB write optional keeps tests and CI safe; writing to ledgers is actionable and durable for triage. - ai retry backoff params: default retries=3, initial_backoff=0.5s, jitter ±10%, fallback smaller_batch_size = max(1, batch_size // 2). Rationale: conservative defaults that map to design and are implementable/testable. - fetch_body_text retries: 3 attempts per ext_id with small exponential backoff (0.5s). Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) to limit pool size and avoid pool warnings. Default max workers lowered to 10. - Interface changes: ensure_text_embeddings will return an extended result with failed_ids as a 5th element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). I will update rerun_embeddings and its tests accordingly. Rationale: rerun needs failed ids; propagating as return value is simplest and testable. - All new code uses logging.getLogger(__name__) (no print in library modules) to obey constraints. - Tests will use monkeypatching/mocks to avoid network/DB dependencies. Dependency graph (high level) Batch 1 (foundation, parallel): tasks 1.1–1.4 (no interdeps except where noted). Batch 2 (core, parallel): tasks 2.1–2.3 (depend on Batch 1). Batch 3 (safety & QA, parallel): task 3.1 (depends on Batch 2 and Batch 1). ``` Batch 1 (parallel): 1.1, 1.2, 1.3, 1.4 Batch 2 (parallel): 2.1, 2.2, 2.3 [depends on Batch 1] Batch 3 (parallel): 3.1 [depends on Batch 2] ``` --- ## Batch 1: Foundation (parallel - 4 implementers) All tasks in this batch have NO (external) dependencies except where noted. ### Task 1.1: MotionDatabase.append_audit_event **Owner:** implementer (author) **Estimate:** 2 hours **Depends:** none **Description:** Add an append_audit_event(...) helper to database.MotionDatabase. This method will attempt to INSERT a row into an audit_events table (if the table exists). If DuckDB is not available or the table does not exist, append the event to a JSON file under thoughts/ledgers/audit_events.json. This provides a stable place to record per-item failures without forcing a migration to run during tests/CI. **File:** `database.py` (modify: add method) **Test:** `tests/test_database_audit.py` (new) Implementation notes (decisions): - Signature: append_audit_event(actor_id: str | None, action: str, target_type: str | None = None, target_id: str | None = None, metadata: dict | None = None) -> bool - Behavior: - If duckdb is None: write (append) to thoughts/ledgers/audit_events.json as list of event objects (create file/dir as needed). - If duckdb present: run "INSERT INTO audit_events (... )" wrapped in try/except; if table missing or INSERT fails, fall back to writing to the ledger file. - Do not raise; log at appropriate levels and return True if recorded somewhere, False otherwise. - Use uuid.uuid4() for id and UTC timestamp for created_at. - Use logging.getLogger(__name__) for messages. Test (complete list): - tests/test_database_audit.py - Case A (duckdb=None emulation): monkeypatch database.duckdb = None, ensure Ledger file created and content contains the event. - Case B (duckdb present but table insertion raises): monkeypatch duckdb.connect to a MagicMock that raises on execute -> verify fallback to ledger file. - Verify method returns True when written to ledger, and that JSON is valid. Verify: - pytest -q tests/test_database_audit.py Commit message suggestion: - feat(db): add append_audit_event helper to MotionDatabase (ledger fallback) --- ### Task 1.2: ai provider retry/fallback wrapper **Owner:** implementer **Estimate:** 3 hours **Depends:** 1.1 (uses MotionDatabase.append_audit_event) **Description:** Add a small module that wraps ai_provider.get_embeddings_batch to provide robust retries and fallback to smaller batch sizes. The wrapper returns a list of embeddings aligned with inputs; for items that permanently fail we return None in-place and record an audit event via MotionDatabase.append_audit_event. **File:** `pipeline/ai_provider_wrapper.py` (new) **Test:** `tests/test_ai_provider_wrapper.py` (new) Implementation details: - Provide function get_embeddings_with_retry(texts: list[str], motion_ids: list[int] | None = None, model: str | None = None, batch_size: int = 50, retries: int = 3) -> list[Optional[list[float]]] - Approach: - Iterate inputs in chunks of batch_size. - For each chunk: - Try ai_provider.get_embeddings_batch(chunk, model=model, batch_size=batch_size) up to `retries` with exponential backoff (initial_backoff=0.5s, jitter). - If a chunk continuously fails, split the chunk into subchunks (smaller_batch_size = max(1, batch_size // 2)) and retry the subchunks with the same logic. - If an individual text still fails, mark the corresponding index result as None and record an audit event via MotionDatabase.append_audit_event with action='embedding_failed' and metadata including model, exception message, and attempts. - Return a results list of the same length as inputs (embedding lists or None). - Use MotionDatabase(db_path=...) only if a db_path is provided in env/config or via optional parameter — by default use database.db (existing module-level db instance) to call append_audit_event. - Keep function pure enough to be unit-tested by monkeypatching ai_provider.get_embeddings_batch and MotionDatabase.append_audit_event. Test cases: - test successful batch returns embeddings aligned to inputs - test simulated transient failure where first attempt fails and second succeeds (observed retry) - test persistent chunk failure triggers fallback to smaller chunks and eventual audit appended for failing items (verify append_audit_event called with expected metadata) - tests use monkeypatch to stub ai_provider.get_embeddings_batch behavior and MotionDatabase.append_audit_event Verify: - pytest -q tests/test_ai_provider_wrapper.py Commit: - feat(pipeline): ai provider retry/fallback wrapper --- ### Task 1.3: QA script — scripts/qa_similarity.py **Owner:** implementer **Estimate:** 2 hours **Depends:** none **Description:** Add a small script that samples N motions across windows, runs similarity lookup for each sampled motion, asserts simple heuristics (e.g., top-5 are not all score==1.0 except identical IDs), and writes a JSON summary into thoughts/ledgers/qa_similarity_{timestamp}.json. This script is meant to be run manually/CI for a quick QA check. **File:** `scripts/qa_similarity.py` (new) **Test:** `tests/test_qa_similarity.py` (new) Implementation notes: - CLI: --db-path, --sample-size (default 50), --top-k (default 5) - Implementation uses MotionDatabase to select a small set of motions and similarity.get_cached_similarities (or MotionDatabase.get_cached_similarities) to evaluate neighbors. - The script returns a dict summary which is also written to a uniquely named JSON under thoughts/ledgers/. - For tests, monkeypatch MotionDatabase to return deterministic samples and similarities; verify the script produces the expected JSON summary and returns reasonable pass/fail flags. Verify: - pytest -q tests/test_qa_similarity.py - Run manually: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10 Commit: - feat(scripts): add QA similarity sampling script and ledger writer --- ### Task 1.4: sync_motion_content — reduce concurrency, add per-ext_id retry, add CLI flag **Owner:** implementer **Estimate:** 3 hours **Depends:** 1.1 (write failures to audit via MotionDatabase.append_audit_event) **Description:** Harden the body text fetcher: - Add CLI flag `--max-body-workers` (default reduce to 10). - Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) when creating the requests.Session in sync_motion_content. - Implement per-ext_id retry in _fetch_body_text: try up to 3 times with exponential backoff on network errors/5xx/429. - When a body_text fetch permanently fails, call MotionDatabase.append_audit_event(action='body_fetch_failed', target_type='document', target_id=ext_id, metadata=...) so failures are recorded. **File:** `scripts/sync_motion_content.py` (modify) **Test:** `tests/test_sync_motion_content.py` (new) Implementation details: - Add parser.add_argument("--max-body-workers", type=int, default=10, help=...) in CLI - When creating session: mount HTTPAdapter with pool_maxsize equal to max_body_workers (requests.adapters.HTTPAdapter(pool_maxsize=...)). Also set session.adapters["https://"] = adapter. - Modify _fetch_body_text(ext_id, session) to attempt up to 3 tries and return None on exhaustion; log appropriately; call db.append_audit_event when permanently failing (db from database.db). - Update fetch_body_texts to pass max_workers param through as already implemented, but default constant MAX_BODY_WORKERS should be set to 10 at top of file. Test plan: - Test that _fetch_body_text retries: monkeypatch session.get to fail first (raise requests.ConnectionError) and succeed second; verify returned text is successful and that only as many attempts occurred as expected. - Test permanent failure case: monkeypatch session.get to always raise and verify MotionDatabase.append_audit_event was called (monkeypatch database.db.append_audit_event). - Test fetch_body_texts respects max_workers param by running small set and monkeypatching ThreadPoolExecutor to observe max_workers argument (or call with small size and assert function returns mapped results). Verify: - pytest -q tests/test_sync_motion_content.py - Manual run: python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10 Commit: - feat(sync): add per-ext_id retries and --max-body-workers flag (defaults to 10), record failures to audit --- ## Batch 2: Core modules (parallel - 3 implementers) These tasks depend on Batch 1 (ai wrapper and audit method must be present). ### Task 2.1: text_pipeline — use ai wrapper & return failed_ids **Owner:** implementer **Estimate:** 3 hours **Depends:** 1.2 (ai_provider_wrapper) and 1.1 (audit) **Description:** Modify pipeline/text_pipeline.py to call the new ai_provider_wrapper.get_embeddings_with_retry instead of ai_provider.get_embeddings_batch. Extend ensure_text_embeddings to collect indexes/ids of motions which failed to get embeddings and return them as a fifth element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). Keep logging behavior similar but include a log line reporting failed_ids for the run. **File:** `pipeline/text_pipeline.py` (modify) **Test:** `tests/test_text_pipeline_retry.py` (new) Implementation details: - Replace the ai_provider.get_embeddings_batch(batch_texts, ...) call with wrapper.get_embeddings_with_retry(batch_texts, batch_ids, model=model, batch_size=batch_size, retries=3). - The wrapper returns list aligned with batch_texts containing either embedding list or None. For each None, increment errors and append motion_id to failed_ids. - At the end of ensure_text_embeddings, return stored, skipped_existing, skipped_no_text, errors, failed_ids. - Also ensure docstring updated. - Keep existing counting and logging; existing callers will be updated in Task 2.2. Test plan: - Unit test that ensure_text_embeddings: - when wrapper returns embeddings for all batch items, stored increments as expected. - when wrapper returns None for some items, those motion_ids included in failed_ids and errors counts reflect them. - Use monkeypatch to stub pipeline.ai_provider_wrapper.get_embeddings_with_retry and database.db.store_embedding. Verify: - pytest -q tests/test_text_pipeline_retry.py Commit: - feat(pipeline): use ai_provider wrapper for robust embeddings and return failed ids --- ### Task 2.2: rerun_embeddings — add --retry-missing mode and wire re-run **Owner:** implementer **Estimate:** 2.5 hours **Depends:** 2.1 (ensure_text_embeddings new return) **Description:** Add a CLI flag `--retry-missing` to scripts/rerun_embeddings.py. When set, after the main ensure_text_embeddings call, if the returned `failed_ids` list is non-empty, attempt to re-run embedding for just those failed motion ids using smaller batch_size (e.g., half) via a new helper in text_pipeline (call ensure_text_embeddings with an argument to limit to a provided list OR use a new function text_pipeline.embed_given_ids(...)). To keep changes minimal, call ensure_text_embeddings with a temporary limit and the wrapper can accept a `motion_ids` argument. The script should record audit events for items that still fail after retry. **File:** `scripts/rerun_embeddings.py` (modify) **Test:** `tests/test_rerun_embeddings.py` (modify — existing test) Implementation notes: - Add parser.add_argument("--retry-missing", action="store_true", help=...). - After first ensure_text_embeddings, expect a 5-tuple. If retry_missing and failed_ids exist, call a second short pass: call text_pipeline.get_embeddings_for_ids(db_path=db_path, ids=failed_ids, model=model, batch_size=max(1, batch_size // 2)). Option: reuse ensure_text_embeddings by adding optional parameter to accept a list of motion ids (we added returning failed_ids earlier; modify text_pipeline to accept motion_id list). Implementation choice: add new helper function in text_pipeline called ensure_text_embeddings_for_ids, and use it here. - Update tests/test_rerun_embeddings.py to monkeypatch the new text_pipeline helper and simulate that first call returns failed_ids and second call resolves them; assert rerun called accordingly and summary contains expected fields. Test changes: - Update tests/test_rerun_embeddings.py to reflect that text_pipeline.ensure_text_embeddings returns five values and to simulate --retry-missing behavior. - Keep the existing expectations in the test (we will extend them to include failed_ids handling). Verify: - pytest -q tests/test_rerun_embeddings.py - Manual run: python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing Commit: - feat(scripts): add --retry-missing to rerun_embeddings and retry failed items with smaller batches --- ### Task 2.3: similarity.compute — DB-side safety filter to avoid trivial 1.0 matches **Owner:** implementer **Estimate:** 3 hours **Depends:** none (reads existing DB) **Description:** Add a small DB-side filter before inserting similarity rows that filters out suspicious 1.0 matches between different motions when the titles are extremely short (heuristic: identical titles with length < 12 characters). Add diagnostic logging for filtered pairs. **File:** `similarity/compute.py` (modify) **Test:** `tests/test_similarity_compute_filter.py` (new) Implementation details: - After building rows_to_insert (list of dicts with source/target ids & score), perform: - If score == 1.0 (or very near 1.0 with tolerance e.g., > 0.999999), fetch titles for the set of involved ids (single query: SELECT id, title FROM motions WHERE id IN (...)). - For each candidate row with perfect/near-perfect score, if motion titles are equal and len(title.strip()) < 12, skip insertion and log debug/info that pair was filtered due to trivial short identical title. - The threshold 12 chosen conservatively (document in commit). - Keep inserted count and return behavior unchanged. - Make sure DB connections are opened/closed per method. Test plan: - Construct a minimal in-memory or duckdb-mocked scenario where two different motion ids have identical short title and their vectors produce 1.0 similarity. Monkeypatch duckdb.connect to return rows such that compute_similarities will produce rows_to_insert including a 1.0. Verify store_similarity_batch is not called for that row (monkeypatch MotionDatabase.store_similarity_batch or spy on db.store_similarity_batch calls). Verify: - pytest -q tests/test_similarity_compute_filter.py Commit: - fix(similarity): filter trivial 1.0 matches for very-short identical titles --- ## Batch 3: Observability / Integration (parallel - 1-2 implementers) These are small finishing tasks (audit/ledgers, small extras). ### Task 3.1: Tests & CI adjustments, docs, ledger examples **Owner:** reviewer (PR reviewer) **Estimate:** 2 hours **Depends:** all tasks above (1.1–2.3) **Description:** After the code is in, run full test suite, fix any flaky tests, add short README note in thoughts/ledgers/ about how to run QA script and how audit_events fallback works. Add a small example ledger created by QA script if helpful. **Files:** (changes/additions) - `thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md` (this plan — created) - `thoughts/ledgers/README_motion_enrichment.md` (new, optional) - No dedicated unit test for this task; it's a reviewer/integration task. Verification: - Run full tests: pytest - Run QA script locally: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10 - Inspect thoughts/ledgers/qa_similarity_*.json and audit_events ledger file. Commit: - docs(ledgers): document QA and audit fallback behavior --- ## Test / Verification summary (per-task commands) - Task 1.1 - pytest -q tests/test_database_audit.py - Task 1.2 - pytest -q tests/test_ai_provider_wrapper.py - Task 1.3 - pytest -q tests/test_qa_similarity.py - python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10 - Task 1.4 - pytest -q tests/test_sync_motion_content.py - python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10 --skip-body-text (dry run) - Task 2.1 - pytest -q tests/test_text_pipeline_retry.py - Task 2.2 - pytest -q tests/test_rerun_embeddings.py - python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing - Task 2.3 - pytest -q tests/test_similarity_compute_filter.py Full suite verification: - pytest -q --- If you want I can now: - generate the apply_patch to create the files and tests described (one patch containing all files), or - create the plan file only (this document was requested) — I have it ready at: thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md Which would you like next?