20 KiB

Raw Blame History

motion content enrichment — implementation plan

Goal: Implement the prioritized incremental hardening from the design (2026-03-23) so the SyncFeed → embedding → fusion → similarity pipeline is more robust, observable, and testable. Break the work into small, independent micro-tasks (one file + its test per task) so many implementers can work in parallel.

Design doc: thoughts/shared/designs/2026-03-23-motion-content-enrichment-next-steps-design.md

Architecture summary (what I'll implement)

Add a small audit API on MotionDatabase so code can record per-item failures in a stable place (or fall back to a ledger file if DuckDB is not present).
Add a dedicated ai_provider retry/fallback wrapper that:
- retries failed batches (exponential backoff),
- on persistent failure retries missing items with smaller batch sizes,
- returns aligned embedding results (None for failed items),
- records persistent failures to audit_events (using MotionDatabase.append_audit_event).
Wire text embedding pipeline to use the wrapper and return failed ids (so rerun script can retry them).
Add a --max-body-workers CLI option to scripts/sync_motion_content.py, reduce default to 10 and add per-request retries.
Add --retry-missing to scripts/rerun_embeddings.py: rerun missing failed items with smaller batches.
Add a DB-side safety filter in similarity.compute to avoid inserting trivial 1.0 matches for very-short identical titles.
Add a small QA script scripts/qa_similarity.py that samples windows/motions and writes a short JSON ledger for manual review.
Add focused unit tests for the new behaviours (ai retry wrapper, DB audit append, sync body fetch retries, rerun retry mode, similarity filter, QA script).

Decisions / gap filling (why these concrete choices)

Audit recording: implement MotionDatabase.append_audit_event that writes to audit_events table if present, else appends to thoughts/ledgers/audit_events.json. Rationale: migration SQL is a commented placeholder; making DB write optional keeps tests and CI safe; writing to ledgers is actionable and durable for triage.
ai retry backoff params: default retries=3, initial_backoff=0.5s, jitter ±10%, fallback smaller_batch_size = max(1, batch_size // 2). Rationale: conservative defaults that map to design and are implementable/testable.
fetch_body_text retries: 3 attempts per ext_id with small exponential backoff (0.5s). Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) to limit pool size and avoid pool warnings. Default max workers lowered to 10.
Interface changes: ensure_text_embeddings will return an extended result with failed_ids as a 5th element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). I will update rerun_embeddings and its tests accordingly. Rationale: rerun needs failed ids; propagating as return value is simplest and testable.
All new code uses logging.getLogger(name) (no print in library modules) to obey constraints.
Tests will use monkeypatching/mocks to avoid network/DB dependencies.

Dependency graph (high level)

Batch 1 (foundation, parallel): tasks 1.1–1.4 (no interdeps except where noted).
Batch 2 (core, parallel): tasks 2.1–2.3 (depend on Batch 1).
Batch 3 (safety & QA, parallel): task 3.1 (depends on Batch 2 and Batch 1).

Batch 1 (parallel): 1.1, 1.2, 1.3, 1.4
Batch 2 (parallel): 2.1, 2.2, 2.3  [depends on Batch 1]
Batch 3 (parallel): 3.1            [depends on Batch 2]

Batch 1: Foundation (parallel - 4 implementers)

All tasks in this batch have NO (external) dependencies except where noted.

Task 1.1: MotionDatabase.append_audit_event

Owner: implementer (author)
Estimate: 2 hours
Depends: none
Description: Add an append_audit_event(...) helper to database.MotionDatabase. This method will attempt to INSERT a row into an audit_events table (if the table exists). If DuckDB is not available or the table does not exist, append the event to a JSON file under thoughts/ledgers/audit_events.json. This provides a stable place to record per-item failures without forcing a migration to run during tests/CI.

File: database.py (modify: add method)
Test: tests/test_database_audit.py (new)

Implementation notes (decisions):

Signature: append_audit_event(actor_id: str | None, action: str, target_type: str | None = None, target_id: str | None = None, metadata: dict | None = None) -> bool
Behavior:
- If duckdb is None: write (append) to thoughts/ledgers/audit_events.json as list of event objects (create file/dir as needed).
- If duckdb present: run "INSERT INTO audit_events (... )" wrapped in try/except; if table missing or INSERT fails, fall back to writing to the ledger file.
- Do not raise; log at appropriate levels and return True if recorded somewhere, False otherwise.
Use uuid.uuid4() for id and UTC timestamp for created_at.
Use logging.getLogger(name) for messages.

Test (complete list):

tests/test_database_audit.py
- Case A (duckdb=None emulation): monkeypatch database.duckdb = None, ensure Ledger file created and content contains the event.
- Case B (duckdb present but table insertion raises): monkeypatch duckdb.connect to a MagicMock that raises on execute -> verify fallback to ledger file.
- Verify method returns True when written to ledger, and that JSON is valid.

Verify:

pytest -q tests/test_database_audit.py

Commit message suggestion:

feat(db): add append_audit_event helper to MotionDatabase (ledger fallback)

Task 1.2: ai provider retry/fallback wrapper

Owner: implementer
Estimate: 3 hours
Depends: 1.1 (uses MotionDatabase.append_audit_event)
Description: Add a small module that wraps ai_provider.get_embeddings_batch to provide robust retries and fallback to smaller batch sizes. The wrapper returns a list of embeddings aligned with inputs; for items that permanently fail we return None in-place and record an audit event via MotionDatabase.append_audit_event.

File: pipeline/ai_provider_wrapper.py (new)
Test: tests/test_ai_provider_wrapper.py (new)

Implementation details:

Provide function get_embeddings_with_retry(texts: list[str], motion_ids: list[int] | None = None, model: str | None = None, batch_size: int = 50, retries: int = 3) -> list[Optional[list[float]]]
Approach:
- Iterate inputs in chunks of batch_size.
- For each chunk:
  - Try ai_provider.get_embeddings_batch(chunk, model=model, batch_size=batch_size) up to retries with exponential backoff (initial_backoff=0.5s, jitter).
  - If a chunk continuously fails, split the chunk into subchunks (smaller_batch_size = max(1, batch_size // 2)) and retry the subchunks with the same logic.
  - If an individual text still fails, mark the corresponding index result as None and record an audit event via MotionDatabase.append_audit_event with action='embedding_failed' and metadata including model, exception message, and attempts.
- Return a results list of the same length as inputs (embedding lists or None).
Use MotionDatabase(db_path=...) only if a db_path is provided in env/config or via optional parameter — by default use database.db (existing module-level db instance) to call append_audit_event.
Keep function pure enough to be unit-tested by monkeypatching ai_provider.get_embeddings_batch and MotionDatabase.append_audit_event.

Test cases:

test successful batch returns embeddings aligned to inputs
test simulated transient failure where first attempt fails and second succeeds (observed retry)
test persistent chunk failure triggers fallback to smaller chunks and eventual audit appended for failing items (verify append_audit_event called with expected metadata)
tests use monkeypatch to stub ai_provider.get_embeddings_batch behavior and MotionDatabase.append_audit_event

Verify:

pytest -q tests/test_ai_provider_wrapper.py

Commit:

feat(pipeline): ai provider retry/fallback wrapper

Task 1.3: QA script — scripts/qa_similarity.py

Owner: implementer
Estimate: 2 hours
Depends: none
Description: Add a small script that samples N motions across windows, runs similarity lookup for each sampled motion, asserts simple heuristics (e.g., top-5 are not all score==1.0 except identical IDs), and writes a JSON summary into thoughts/ledgers/qa_similarity_{timestamp}.json. This script is meant to be run manually/CI for a quick QA check.

File: scripts/qa_similarity.py (new)
Test: tests/test_qa_similarity.py (new)

Implementation notes:

CLI: --db-path, --sample-size (default 50), --top-k (default 5)
Implementation uses MotionDatabase to select a small set of motions and similarity.get_cached_similarities (or MotionDatabase.get_cached_similarities) to evaluate neighbors.
The script returns a dict summary which is also written to a uniquely named JSON under thoughts/ledgers/.
For tests, monkeypatch MotionDatabase to return deterministic samples and similarities; verify the script produces the expected JSON summary and returns reasonable pass/fail flags.

Verify:

pytest -q tests/test_qa_similarity.py
Run manually: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10

Commit:

feat(scripts): add QA similarity sampling script and ledger writer

Task 1.4: sync_motion_content — reduce concurrency, add per-ext_id retry, add CLI flag

Owner: implementer
Estimate: 3 hours
Depends: 1.1 (write failures to audit via MotionDatabase.append_audit_event)
Description: Harden the body text fetcher:

Add CLI flag --max-body-workers (default reduce to 10).
Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) when creating the requests.Session in sync_motion_content.
Implement per-ext_id retry in _fetch_body_text: try up to 3 times with exponential backoff on network errors/5xx/429.
When a body_text fetch permanently fails, call MotionDatabase.append_audit_event(action='body_fetch_failed', target_type='document', target_id=ext_id, metadata=...) so failures are recorded.

File: scripts/sync_motion_content.py (modify)
Test: tests/test_sync_motion_content.py (new)

Implementation details:

Add parser.add_argument("--max-body-workers", type=int, default=10, help=...) in CLI
When creating session: mount HTTPAdapter with pool_maxsize equal to max_body_workers (requests.adapters.HTTPAdapter(pool_maxsize=...)). Also set session.adapters["https://"] = adapter.
Modify _fetch_body_text(ext_id, session) to attempt up to 3 tries and return None on exhaustion; log appropriately; call db.append_audit_event when permanently failing (db from database.db).
Update fetch_body_texts to pass max_workers param through as already implemented, but default constant MAX_BODY_WORKERS should be set to 10 at top of file.

Test plan:

Test that _fetch_body_text retries: monkeypatch session.get to fail first (raise requests.ConnectionError) and succeed second; verify returned text is successful and that only as many attempts occurred as expected.
Test permanent failure case: monkeypatch session.get to always raise and verify MotionDatabase.append_audit_event was called (monkeypatch database.db.append_audit_event).
Test fetch_body_texts respects max_workers param by running small set and monkeypatching ThreadPoolExecutor to observe max_workers argument (or call with small size and assert function returns mapped results).

Verify:

pytest -q tests/test_sync_motion_content.py
Manual run: python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10

Commit:

feat(sync): add per-ext_id retries and --max-body-workers flag (defaults to 10), record failures to audit

Batch 2: Core modules (parallel - 3 implementers)

These tasks depend on Batch 1 (ai wrapper and audit method must be present).

Task 2.1: text_pipeline — use ai wrapper & return failed_ids

Owner: implementer
Estimate: 3 hours
Depends: 1.2 (ai_provider_wrapper) and 1.1 (audit)
Description: Modify pipeline/text_pipeline.py to call the new ai_provider_wrapper.get_embeddings_with_retry instead of ai_provider.get_embeddings_batch. Extend ensure_text_embeddings to collect indexes/ids of motions which failed to get embeddings and return them as a fifth element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). Keep logging behavior similar but include a log line reporting failed_ids for the run.

File: pipeline/text_pipeline.py (modify)
Test: tests/test_text_pipeline_retry.py (new)

Implementation details:

Replace the ai_provider.get_embeddings_batch(batch_texts, ...) call with wrapper.get_embeddings_with_retry(batch_texts, batch_ids, model=model, batch_size=batch_size, retries=3).
The wrapper returns list aligned with batch_texts containing either embedding list or None. For each None, increment errors and append motion_id to failed_ids.
At the end of ensure_text_embeddings, return stored, skipped_existing, skipped_no_text, errors, failed_ids.
Also ensure docstring updated.
Keep existing counting and logging; existing callers will be updated in Task 2.2.

Test plan:

Unit test that ensure_text_embeddings:
- when wrapper returns embeddings for all batch items, stored increments as expected.
- when wrapper returns None for some items, those motion_ids included in failed_ids and errors counts reflect them.
Use monkeypatch to stub pipeline.ai_provider_wrapper.get_embeddings_with_retry and database.db.store_embedding.

Verify:

pytest -q tests/test_text_pipeline_retry.py

Commit:

feat(pipeline): use ai_provider wrapper for robust embeddings and return failed ids

Task 2.2: rerun_embeddings — add --retry-missing mode and wire re-run

Owner: implementer
Estimate: 2.5 hours
Depends: 2.1 (ensure_text_embeddings new return)
Description: Add a CLI flag --retry-missing to scripts/rerun_embeddings.py. When set, after the main ensure_text_embeddings call, if the returned failed_ids list is non-empty, attempt to re-run embedding for just those failed motion ids using smaller batch_size (e.g., half) via a new helper in text_pipeline (call ensure_text_embeddings with an argument to limit to a provided list OR use a new function text_pipeline.embed_given_ids(...)). To keep changes minimal, call ensure_text_embeddings with a temporary limit and the wrapper can accept a motion_ids argument. The script should record audit events for items that still fail after retry.

File: scripts/rerun_embeddings.py (modify)
Test: tests/test_rerun_embeddings.py (modify — existing test)

Implementation notes:

Add parser.add_argument("--retry-missing", action="store_true", help=...).
After first ensure_text_embeddings, expect a 5-tuple. If retry_missing and failed_ids exist, call a second short pass: call text_pipeline.get_embeddings_for_ids(db_path=db_path, ids=failed_ids, model=model, batch_size=max(1, batch_size // 2)). Option: reuse ensure_text_embeddings by adding optional parameter to accept a list of motion ids (we added returning failed_ids earlier; modify text_pipeline to accept motion_id list). Implementation choice: add new helper function in text_pipeline called ensure_text_embeddings_for_ids, and use it here.
Update tests/test_rerun_embeddings.py to monkeypatch the new text_pipeline helper and simulate that first call returns failed_ids and second call resolves them; assert rerun called accordingly and summary contains expected fields.

Test changes:

Update tests/test_rerun_embeddings.py to reflect that text_pipeline.ensure_text_embeddings returns five values and to simulate --retry-missing behavior.
Keep the existing expectations in the test (we will extend them to include failed_ids handling).

Verify:

pytest -q tests/test_rerun_embeddings.py
Manual run: python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing

Commit:

feat(scripts): add --retry-missing to rerun_embeddings and retry failed items with smaller batches

Task 2.3: similarity.compute — DB-side safety filter to avoid trivial 1.0 matches

Owner: implementer
Estimate: 3 hours
Depends: none (reads existing DB)
Description: Add a small DB-side filter before inserting similarity rows that filters out suspicious 1.0 matches between different motions when the titles are extremely short (heuristic: identical titles with length < 12 characters). Add diagnostic logging for filtered pairs.

File: similarity/compute.py (modify)
Test: tests/test_similarity_compute_filter.py (new)

Implementation details:

After building rows_to_insert (list of dicts with source/target ids & score), perform:
- If score == 1.0 (or very near 1.0 with tolerance e.g., > 0.999999), fetch titles for the set of involved ids (single query: SELECT id, title FROM motions WHERE id IN (...)).
- For each candidate row with perfect/near-perfect score, if motion titles are equal and len(title.strip()) < 12, skip insertion and log debug/info that pair was filtered due to trivial short identical title.
The threshold 12 chosen conservatively (document in commit).
Keep inserted count and return behavior unchanged.
Make sure DB connections are opened/closed per method.

Test plan:

Construct a minimal in-memory or duckdb-mocked scenario where two different motion ids have identical short title and their vectors produce 1.0 similarity. Monkeypatch duckdb.connect to return rows such that compute_similarities will produce rows_to_insert including a 1.0. Verify store_similarity_batch is not called for that row (monkeypatch MotionDatabase.store_similarity_batch or spy on db.store_similarity_batch calls).

Verify:

pytest -q tests/test_similarity_compute_filter.py

Commit:

fix(similarity): filter trivial 1.0 matches for very-short identical titles

Batch 3: Observability / Integration (parallel - 1-2 implementers)

These are small finishing tasks (audit/ledgers, small extras).

Task 3.1: Tests & CI adjustments, docs, ledger examples

Owner: reviewer (PR reviewer)
Estimate: 2 hours
Depends: all tasks above (1.1–2.3)
Description: After the code is in, run full test suite, fix any flaky tests, add short README note in thoughts/ledgers/ about how to run QA script and how audit_events fallback works. Add a small example ledger created by QA script if helpful.

Files: (changes/additions)

thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md (this plan — created)
thoughts/ledgers/README_motion_enrichment.md (new, optional)
No dedicated unit test for this task; it's a reviewer/integration task.

Verification:

Run full tests: pytest
Run QA script locally: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
Inspect thoughts/ledgers/qa_similarity_*.json and audit_events ledger file.

Commit:

docs(ledgers): document QA and audit fallback behavior

Test / Verification summary (per-task commands)

Task 1.1
- pytest -q tests/test_database_audit.py
Task 1.2
- pytest -q tests/test_ai_provider_wrapper.py
Task 1.3
- pytest -q tests/test_qa_similarity.py
- python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
Task 1.4
- pytest -q tests/test_sync_motion_content.py
- python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10 --skip-body-text (dry run)
Task 2.1
- pytest -q tests/test_text_pipeline_retry.py
Task 2.2
- pytest -q tests/test_rerun_embeddings.py
- python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing
Task 2.3
- pytest -q tests/test_similarity_compute_filter.py

Full suite verification:

pytest -q

If you want I can now:

generate the apply_patch to create the files and tests described (one patch containing all files), or
create the plan file only (this document was requested) — I have it ready at: thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md

Which would you like next?

20 KiB Raw Blame History

motion content enrichment — implementation plan

Batch 1: Foundation (parallel - 4 implementers)

Task 1.1: MotionDatabase.append_audit_event

Task 1.2: ai provider retry/fallback wrapper

Task 1.3: QA script — scripts/qa_similarity.py

Task 1.4: sync_motion_content — reduce concurrency, add per-ext_id retry, add CLI flag

Batch 2: Core modules (parallel - 3 implementers)

Task 2.1: text_pipeline — use ai wrapper & return failed_ids

Task 2.2: rerun_embeddings — add --retry-missing mode and wire re-run

Task 2.3: similarity.compute — DB-side safety filter to avoid trivial 1.0 matches

Batch 3: Observability / Integration (parallel - 1-2 implementers)

Task 3.1: Tests & CI adjustments, docs, ledger examples

Test / Verification summary (per-task commands)

20 KiB

Raw Blame History