# motion content enrichment — implementation plan

Goal: Implement the prioritized incremental hardening from the design (2026-03-23) so the SyncFeed → embedding → fusion → similarity pipeline is more robust, observable, and testable. Break the work into small, independent micro-tasks (one file + its test per task) so many implementers can work in parallel.

Design doc: thoughts/shared/designs/2026-03-23-motion-content-enrichment-next-steps-design.md

Architecture summary (what I'll implement)
- Add a small audit API on MotionDatabase so code can record per-item failures in a stable place (or fall back to a ledger file if DuckDB is not present).
- Add a dedicated ai_provider retry/fallback wrapper that:
  - retries failed batches (exponential backoff),
  - on persistent failure retries missing items with smaller batch sizes,
  - returns aligned embedding results (None for failed items),
  - records persistent failures to audit_events (using MotionDatabase.append_audit_event).
- Wire text embedding pipeline to use the wrapper and return failed ids (so rerun script can retry them).
- Add a `--max-body-workers` CLI option to scripts/sync_motion_content.py, reduce default to 10 and add per-request retries.
- Add `--retry-missing` to scripts/rerun_embeddings.py: rerun missing failed items with smaller batches.
- Add a DB-side safety filter in similarity.compute to avoid inserting trivial 1.0 matches for very-short identical titles.
- Add a small QA script scripts/qa_similarity.py that samples windows/motions and writes a short JSON ledger for manual review.
- Add focused unit tests for the new behaviours (ai retry wrapper, DB audit append, sync body fetch retries, rerun retry mode, similarity filter, QA script).

Decisions / gap filling (why these concrete choices)
- Audit recording: implement MotionDatabase.append_audit_event that writes to audit_events table if present, else appends to thoughts/ledgers/audit_events.json. Rationale: migration SQL is a commented placeholder; making DB write optional keeps tests and CI safe; writing to ledgers is actionable and durable for triage.
- ai retry backoff params: default retries=3, initial_backoff=0.5s, jitter ±10%, fallback smaller_batch_size = max(1, batch_size // 2). Rationale: conservative defaults that map to design and are implementable/testable.
- fetch_body_text retries: 3 attempts per ext_id with small exponential backoff (0.5s). Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) to limit pool size and avoid pool warnings. Default max workers lowered to 10.
- Interface changes: ensure_text_embeddings will return an extended result with failed_ids as a 5th element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). I will update rerun_embeddings and its tests accordingly. Rationale: rerun needs failed ids; propagating as return value is simplest and testable.
- All new code uses logging.getLogger(__name__) (no print in library modules) to obey constraints.
- Tests will use monkeypatching/mocks to avoid network/DB dependencies.

Dependency graph (high level)

Batch 1 (foundation, parallel): tasks 1.1–1.4 (no interdeps except where noted).  
Batch 2 (core, parallel): tasks 2.1–2.3 (depend on Batch 1).  
Batch 3 (safety & QA, parallel): task 3.1 (depends on Batch 2 and Batch 1).

```
Batch 1 (parallel): 1.1, 1.2, 1.3, 1.4
Batch 2 (parallel): 2.1, 2.2, 2.3  [depends on Batch 1]
Batch 3 (parallel): 3.1            [depends on Batch 2]
```

---

## Batch 1: Foundation (parallel - 4 implementers)

All tasks in this batch have NO (external) dependencies except where noted.

### Task 1.1: MotionDatabase.append_audit_event
**Owner:** implementer (author)  
**Estimate:** 2 hours  
**Depends:** none  
**Description:** Add an append_audit_event(...) helper to database.MotionDatabase. This method will attempt to INSERT a row into an audit_events table (if the table exists). If DuckDB is not available or the table does not exist, append the event to a JSON file under thoughts/ledgers/audit_events.json. This provides a stable place to record per-item failures without forcing a migration to run during tests/CI.

**File:** `database.py` (modify: add method)  
**Test:** `tests/test_database_audit.py` (new)

Implementation notes (decisions):
- Signature: append_audit_event(actor_id: str | None, action: str, target_type: str | None = None, target_id: str | None = None, metadata: dict | None = None) -> bool
- Behavior:
  - If duckdb is None: write (append) to thoughts/ledgers/audit_events.json as list of event objects (create file/dir as needed).
  - If duckdb present: run "INSERT INTO audit_events (... )" wrapped in try/except; if table missing or INSERT fails, fall back to writing to the ledger file.
  - Do not raise; log at appropriate levels and return True if recorded somewhere, False otherwise.
- Use uuid.uuid4() for id and UTC timestamp for created_at.
- Use logging.getLogger(__name__) for messages.

Test (complete list):
- tests/test_database_audit.py
  - Case A (duckdb=None emulation): monkeypatch database.duckdb = None, ensure Ledger file created and content contains the event.
  - Case B (duckdb present but table insertion raises): monkeypatch duckdb.connect to a MagicMock that raises on execute -> verify fallback to ledger file.
  - Verify method returns True when written to ledger, and that JSON is valid.

Verify:
- pytest -q tests/test_database_audit.py

Commit message suggestion:
- feat(db): add append_audit_event helper to MotionDatabase (ledger fallback)

---

### Task 1.2: ai provider retry/fallback wrapper
**Owner:** implementer  
**Estimate:** 3 hours  
**Depends:** 1.1 (uses MotionDatabase.append_audit_event)  
**Description:** Add a small module that wraps ai_provider.get_embeddings_batch to provide robust retries and fallback to smaller batch sizes. The wrapper returns a list of embeddings aligned with inputs; for items that permanently fail we return None in-place and record an audit event via MotionDatabase.append_audit_event.

**File:** `pipeline/ai_provider_wrapper.py` (new)  
**Test:** `tests/test_ai_provider_wrapper.py` (new)

Implementation details:
- Provide function get_embeddings_with_retry(texts: list[str], motion_ids: list[int] | None = None, model: str | None = None, batch_size: int = 50, retries: int = 3) -> list[Optional[list[float]]]
- Approach:
  - Iterate inputs in chunks of batch_size.
  - For each chunk:
    - Try ai_provider.get_embeddings_batch(chunk, model=model, batch_size=batch_size) up to `retries` with exponential backoff (initial_backoff=0.5s, jitter).
    - If a chunk continuously fails, split the chunk into subchunks (smaller_batch_size = max(1, batch_size // 2)) and retry the subchunks with the same logic.
    - If an individual text still fails, mark the corresponding index result as None and record an audit event via MotionDatabase.append_audit_event with action='embedding_failed' and metadata including model, exception message, and attempts.
  - Return a results list of the same length as inputs (embedding lists or None).
- Use MotionDatabase(db_path=...) only if a db_path is provided in env/config or via optional parameter — by default use database.db (existing module-level db instance) to call append_audit_event.
- Keep function pure enough to be unit-tested by monkeypatching ai_provider.get_embeddings_batch and MotionDatabase.append_audit_event.

Test cases:
- test successful batch returns embeddings aligned to inputs
- test simulated transient failure where first attempt fails and second succeeds (observed retry)
- test persistent chunk failure triggers fallback to smaller chunks and eventual audit appended for failing items (verify append_audit_event called with expected metadata)
- tests use monkeypatch to stub ai_provider.get_embeddings_batch behavior and MotionDatabase.append_audit_event

Verify:
- pytest -q tests/test_ai_provider_wrapper.py

Commit:
- feat(pipeline): ai provider retry/fallback wrapper

---

### Task 1.3: QA script — scripts/qa_similarity.py
**Owner:** implementer  
**Estimate:** 2 hours  
**Depends:** none  
**Description:** Add a small script that samples N motions across windows, runs similarity lookup for each sampled motion, asserts simple heuristics (e.g., top-5 are not all score==1.0 except identical IDs), and writes a JSON summary into thoughts/ledgers/qa_similarity_{timestamp}.json. This script is meant to be run manually/CI for a quick QA check.

**File:** `scripts/qa_similarity.py` (new)  
**Test:** `tests/test_qa_similarity.py` (new)

Implementation notes:
- CLI: --db-path, --sample-size (default 50), --top-k (default 5)
- Implementation uses MotionDatabase to select a small set of motions and similarity.get_cached_similarities (or MotionDatabase.get_cached_similarities) to evaluate neighbors.
- The script returns a dict summary which is also written to a uniquely named JSON under thoughts/ledgers/.
- For tests, monkeypatch MotionDatabase to return deterministic samples and similarities; verify the script produces the expected JSON summary and returns reasonable pass/fail flags.

Verify:
- pytest -q tests/test_qa_similarity.py
- Run manually: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10

Commit:
- feat(scripts): add QA similarity sampling script and ledger writer

---

### Task 1.4: sync_motion_content — reduce concurrency, add per-ext_id retry, add CLI flag
**Owner:** implementer  
**Estimate:** 3 hours  
**Depends:** 1.1 (write failures to audit via MotionDatabase.append_audit_event)  
**Description:** Harden the body text fetcher:
- Add CLI flag `--max-body-workers` (default reduce to 10).
- Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) when creating the requests.Session in sync_motion_content.
- Implement per-ext_id retry in _fetch_body_text: try up to 3 times with exponential backoff on network errors/5xx/429.
- When a body_text fetch permanently fails, call MotionDatabase.append_audit_event(action='body_fetch_failed', target_type='document', target_id=ext_id, metadata=...) so failures are recorded.

**File:** `scripts/sync_motion_content.py` (modify)  
**Test:** `tests/test_sync_motion_content.py` (new)

Implementation details:
- Add parser.add_argument("--max-body-workers", type=int, default=10, help=...) in CLI
- When creating session: mount HTTPAdapter with pool_maxsize equal to max_body_workers (requests.adapters.HTTPAdapter(pool_maxsize=...)). Also set session.adapters["https://"] = adapter.
- Modify _fetch_body_text(ext_id, session) to attempt up to 3 tries and return None on exhaustion; log appropriately; call db.append_audit_event when permanently failing (db from database.db).
- Update fetch_body_texts to pass max_workers param through as already implemented, but default constant MAX_BODY_WORKERS should be set to 10 at top of file.

Test plan:
- Test that _fetch_body_text retries: monkeypatch session.get to fail first (raise requests.ConnectionError) and succeed second; verify returned text is successful and that only as many attempts occurred as expected.
- Test permanent failure case: monkeypatch session.get to always raise and verify MotionDatabase.append_audit_event was called (monkeypatch database.db.append_audit_event).
- Test fetch_body_texts respects max_workers param by running small set and monkeypatching ThreadPoolExecutor to observe max_workers argument (or call with small size and assert function returns mapped results).

Verify:
- pytest -q tests/test_sync_motion_content.py
- Manual run: python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10

Commit:
- feat(sync): add per-ext_id retries and --max-body-workers flag (defaults to 10), record failures to audit

---

## Batch 2: Core modules (parallel - 3 implementers)

These tasks depend on Batch 1 (ai wrapper and audit method must be present).

### Task 2.1: text_pipeline — use ai wrapper & return failed_ids
**Owner:** implementer  
**Estimate:** 3 hours  
**Depends:** 1.2 (ai_provider_wrapper) and 1.1 (audit)  
**Description:** Modify pipeline/text_pipeline.py to call the new ai_provider_wrapper.get_embeddings_with_retry instead of ai_provider.get_embeddings_batch. Extend ensure_text_embeddings to collect indexes/ids of motions which failed to get embeddings and return them as a fifth element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). Keep logging behavior similar but include a log line reporting failed_ids for the run.

**File:** `pipeline/text_pipeline.py` (modify)  
**Test:** `tests/test_text_pipeline_retry.py` (new)

Implementation details:
- Replace the ai_provider.get_embeddings_batch(batch_texts, ...) call with wrapper.get_embeddings_with_retry(batch_texts, batch_ids, model=model, batch_size=batch_size, retries=3).
- The wrapper returns list aligned with batch_texts containing either embedding list or None. For each None, increment errors and append motion_id to failed_ids.
- At the end of ensure_text_embeddings, return stored, skipped_existing, skipped_no_text, errors, failed_ids.
- Also ensure docstring updated.
- Keep existing counting and logging; existing callers will be updated in Task 2.2.

Test plan:
- Unit test that ensure_text_embeddings:
  - when wrapper returns embeddings for all batch items, stored increments as expected.
  - when wrapper returns None for some items, those motion_ids included in failed_ids and errors counts reflect them.
- Use monkeypatch to stub pipeline.ai_provider_wrapper.get_embeddings_with_retry and database.db.store_embedding.

Verify:
- pytest -q tests/test_text_pipeline_retry.py

Commit:
- feat(pipeline): use ai_provider wrapper for robust embeddings and return failed ids

---

### Task 2.2: rerun_embeddings — add --retry-missing mode and wire re-run
**Owner:** implementer  
**Estimate:** 2.5 hours  
**Depends:** 2.1 (ensure_text_embeddings new return)  
**Description:** Add a CLI flag `--retry-missing` to scripts/rerun_embeddings.py. When set, after the main ensure_text_embeddings call, if the returned `failed_ids` list is non-empty, attempt to re-run embedding for just those failed motion ids using smaller batch_size (e.g., half) via a new helper in text_pipeline (call ensure_text_embeddings with an argument to limit to a provided list OR use a new function text_pipeline.embed_given_ids(...)). To keep changes minimal, call ensure_text_embeddings with a temporary limit and the wrapper can accept a `motion_ids` argument. The script should record audit events for items that still fail after retry.

**File:** `scripts/rerun_embeddings.py` (modify)  
**Test:** `tests/test_rerun_embeddings.py` (modify — existing test)  

Implementation notes:
- Add parser.add_argument("--retry-missing", action="store_true", help=...).
- After first ensure_text_embeddings, expect a 5-tuple. If retry_missing and failed_ids exist, call a second short pass: call text_pipeline.get_embeddings_for_ids(db_path=db_path, ids=failed_ids, model=model, batch_size=max(1, batch_size // 2)). Option: reuse ensure_text_embeddings by adding optional parameter to accept a list of motion ids (we added returning failed_ids earlier; modify text_pipeline to accept motion_id list). Implementation choice: add new helper function in text_pipeline called ensure_text_embeddings_for_ids, and use it here.
- Update tests/test_rerun_embeddings.py to monkeypatch the new text_pipeline helper and simulate that first call returns failed_ids and second call resolves them; assert rerun called accordingly and summary contains expected fields.

Test changes:
- Update tests/test_rerun_embeddings.py to reflect that text_pipeline.ensure_text_embeddings returns five values and to simulate --retry-missing behavior.
- Keep the existing expectations in the test (we will extend them to include failed_ids handling).

Verify:
- pytest -q tests/test_rerun_embeddings.py
- Manual run: python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing

Commit:
- feat(scripts): add --retry-missing to rerun_embeddings and retry failed items with smaller batches

---

### Task 2.3: similarity.compute — DB-side safety filter to avoid trivial 1.0 matches
**Owner:** implementer  
**Estimate:** 3 hours  
**Depends:** none (reads existing DB)  
**Description:** Add a small DB-side filter before inserting similarity rows that filters out suspicious 1.0 matches between different motions when the titles are extremely short (heuristic: identical titles with length < 12 characters). Add diagnostic logging for filtered pairs.

**File:** `similarity/compute.py` (modify)  
**Test:** `tests/test_similarity_compute_filter.py` (new)

Implementation details:
- After building rows_to_insert (list of dicts with source/target ids & score), perform:
  - If score == 1.0 (or very near 1.0 with tolerance e.g., > 0.999999), fetch titles for the set of involved ids (single query: SELECT id, title FROM motions WHERE id IN (...)).
  - For each candidate row with perfect/near-perfect score, if motion titles are equal and len(title.strip()) < 12, skip insertion and log debug/info that pair was filtered due to trivial short identical title.
- The threshold 12 chosen conservatively (document in commit).
- Keep inserted count and return behavior unchanged.
- Make sure DB connections are opened/closed per method.

Test plan:
- Construct a minimal in-memory or duckdb-mocked scenario where two different motion ids have identical short title and their vectors produce 1.0 similarity. Monkeypatch duckdb.connect to return rows such that compute_similarities will produce rows_to_insert including a 1.0. Verify store_similarity_batch is not called for that row (monkeypatch MotionDatabase.store_similarity_batch or spy on db.store_similarity_batch calls).

Verify:
- pytest -q tests/test_similarity_compute_filter.py

Commit:
- fix(similarity): filter trivial 1.0 matches for very-short identical titles

---

## Batch 3: Observability / Integration (parallel - 1-2 implementers)

These are small finishing tasks (audit/ledgers, small extras).

### Task 3.1: Tests & CI adjustments, docs, ledger examples
**Owner:** reviewer (PR reviewer)  
**Estimate:** 2 hours  
**Depends:** all tasks above (1.1–2.3)  
**Description:** After the code is in, run full test suite, fix any flaky tests, add short README note in thoughts/ledgers/ about how to run QA script and how audit_events fallback works. Add a small example ledger created by QA script if helpful.

**Files:** (changes/additions)
- `thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md` (this plan — created)
- `thoughts/ledgers/README_motion_enrichment.md` (new, optional)
- No dedicated unit test for this task; it's a reviewer/integration task.

Verification:
- Run full tests: pytest
- Run QA script locally: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
- Inspect thoughts/ledgers/qa_similarity_*.json and audit_events ledger file.

Commit:
- docs(ledgers): document QA and audit fallback behavior

---

## Test / Verification summary (per-task commands)

- Task 1.1
  - pytest -q tests/test_database_audit.py
- Task 1.2
  - pytest -q tests/test_ai_provider_wrapper.py
- Task 1.3
  - pytest -q tests/test_qa_similarity.py
  - python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
- Task 1.4
  - pytest -q tests/test_sync_motion_content.py
  - python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10 --skip-body-text (dry run)
- Task 2.1
  - pytest -q tests/test_text_pipeline_retry.py
- Task 2.2
  - pytest -q tests/test_rerun_embeddings.py
  - python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing
- Task 2.3
  - pytest -q tests/test_similarity_compute_filter.py

Full suite verification:
- pytest -q

---

If you want I can now:
- generate the apply_patch to create the files and tests described (one patch containing all files), or
- create the plan file only (this document was requested) — I have it ready at: thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md

Which would you like next?