You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/plans/2026-03-23-motion-content-e...

20 KiB

motion content enrichment — implementation plan

Goal: Implement the prioritized incremental hardening from the design (2026-03-23) so the SyncFeed → embedding → fusion → similarity pipeline is more robust, observable, and testable. Break the work into small, independent micro-tasks (one file + its test per task) so many implementers can work in parallel.

Design doc: thoughts/shared/designs/2026-03-23-motion-content-enrichment-next-steps-design.md

Architecture summary (what I'll implement)

  • Add a small audit API on MotionDatabase so code can record per-item failures in a stable place (or fall back to a ledger file if DuckDB is not present).
  • Add a dedicated ai_provider retry/fallback wrapper that:
    • retries failed batches (exponential backoff),
    • on persistent failure retries missing items with smaller batch sizes,
    • returns aligned embedding results (None for failed items),
    • records persistent failures to audit_events (using MotionDatabase.append_audit_event).
  • Wire text embedding pipeline to use the wrapper and return failed ids (so rerun script can retry them).
  • Add a --max-body-workers CLI option to scripts/sync_motion_content.py, reduce default to 10 and add per-request retries.
  • Add --retry-missing to scripts/rerun_embeddings.py: rerun missing failed items with smaller batches.
  • Add a DB-side safety filter in similarity.compute to avoid inserting trivial 1.0 matches for very-short identical titles.
  • Add a small QA script scripts/qa_similarity.py that samples windows/motions and writes a short JSON ledger for manual review.
  • Add focused unit tests for the new behaviours (ai retry wrapper, DB audit append, sync body fetch retries, rerun retry mode, similarity filter, QA script).

Decisions / gap filling (why these concrete choices)

  • Audit recording: implement MotionDatabase.append_audit_event that writes to audit_events table if present, else appends to thoughts/ledgers/audit_events.json. Rationale: migration SQL is a commented placeholder; making DB write optional keeps tests and CI safe; writing to ledgers is actionable and durable for triage.
  • ai retry backoff params: default retries=3, initial_backoff=0.5s, jitter ±10%, fallback smaller_batch_size = max(1, batch_size // 2). Rationale: conservative defaults that map to design and are implementable/testable.
  • fetch_body_text retries: 3 attempts per ext_id with small exponential backoff (0.5s). Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) to limit pool size and avoid pool warnings. Default max workers lowered to 10.
  • Interface changes: ensure_text_embeddings will return an extended result with failed_ids as a 5th element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). I will update rerun_embeddings and its tests accordingly. Rationale: rerun needs failed ids; propagating as return value is simplest and testable.
  • All new code uses logging.getLogger(name) (no print in library modules) to obey constraints.
  • Tests will use monkeypatching/mocks to avoid network/DB dependencies.

Dependency graph (high level)

Batch 1 (foundation, parallel): tasks 1.1–1.4 (no interdeps except where noted).
Batch 2 (core, parallel): tasks 2.1–2.3 (depend on Batch 1).
Batch 3 (safety & QA, parallel): task 3.1 (depends on Batch 2 and Batch 1).

Batch 1 (parallel): 1.1, 1.2, 1.3, 1.4
Batch 2 (parallel): 2.1, 2.2, 2.3  [depends on Batch 1]
Batch 3 (parallel): 3.1            [depends on Batch 2]

Batch 1: Foundation (parallel - 4 implementers)

All tasks in this batch have NO (external) dependencies except where noted.

Task 1.1: MotionDatabase.append_audit_event

Owner: implementer (author)
Estimate: 2 hours
Depends: none
Description: Add an append_audit_event(...) helper to database.MotionDatabase. This method will attempt to INSERT a row into an audit_events table (if the table exists). If DuckDB is not available or the table does not exist, append the event to a JSON file under thoughts/ledgers/audit_events.json. This provides a stable place to record per-item failures without forcing a migration to run during tests/CI.

File: database.py (modify: add method)
Test: tests/test_database_audit.py (new)

Implementation notes (decisions):

  • Signature: append_audit_event(actor_id: str | None, action: str, target_type: str | None = None, target_id: str | None = None, metadata: dict | None = None) -> bool
  • Behavior:
    • If duckdb is None: write (append) to thoughts/ledgers/audit_events.json as list of event objects (create file/dir as needed).
    • If duckdb present: run "INSERT INTO audit_events (... )" wrapped in try/except; if table missing or INSERT fails, fall back to writing to the ledger file.
    • Do not raise; log at appropriate levels and return True if recorded somewhere, False otherwise.
  • Use uuid.uuid4() for id and UTC timestamp for created_at.
  • Use logging.getLogger(name) for messages.

Test (complete list):

  • tests/test_database_audit.py
    • Case A (duckdb=None emulation): monkeypatch database.duckdb = None, ensure Ledger file created and content contains the event.
    • Case B (duckdb present but table insertion raises): monkeypatch duckdb.connect to a MagicMock that raises on execute -> verify fallback to ledger file.
    • Verify method returns True when written to ledger, and that JSON is valid.

Verify:

  • pytest -q tests/test_database_audit.py

Commit message suggestion:

  • feat(db): add append_audit_event helper to MotionDatabase (ledger fallback)

Task 1.2: ai provider retry/fallback wrapper

Owner: implementer
Estimate: 3 hours
Depends: 1.1 (uses MotionDatabase.append_audit_event)
Description: Add a small module that wraps ai_provider.get_embeddings_batch to provide robust retries and fallback to smaller batch sizes. The wrapper returns a list of embeddings aligned with inputs; for items that permanently fail we return None in-place and record an audit event via MotionDatabase.append_audit_event.

File: pipeline/ai_provider_wrapper.py (new)
Test: tests/test_ai_provider_wrapper.py (new)

Implementation details:

  • Provide function get_embeddings_with_retry(texts: list[str], motion_ids: list[int] | None = None, model: str | None = None, batch_size: int = 50, retries: int = 3) -> list[Optional[list[float]]]
  • Approach:
    • Iterate inputs in chunks of batch_size.
    • For each chunk:
      • Try ai_provider.get_embeddings_batch(chunk, model=model, batch_size=batch_size) up to retries with exponential backoff (initial_backoff=0.5s, jitter).
      • If a chunk continuously fails, split the chunk into subchunks (smaller_batch_size = max(1, batch_size // 2)) and retry the subchunks with the same logic.
      • If an individual text still fails, mark the corresponding index result as None and record an audit event via MotionDatabase.append_audit_event with action='embedding_failed' and metadata including model, exception message, and attempts.
    • Return a results list of the same length as inputs (embedding lists or None).
  • Use MotionDatabase(db_path=...) only if a db_path is provided in env/config or via optional parameter — by default use database.db (existing module-level db instance) to call append_audit_event.
  • Keep function pure enough to be unit-tested by monkeypatching ai_provider.get_embeddings_batch and MotionDatabase.append_audit_event.

Test cases:

  • test successful batch returns embeddings aligned to inputs
  • test simulated transient failure where first attempt fails and second succeeds (observed retry)
  • test persistent chunk failure triggers fallback to smaller chunks and eventual audit appended for failing items (verify append_audit_event called with expected metadata)
  • tests use monkeypatch to stub ai_provider.get_embeddings_batch behavior and MotionDatabase.append_audit_event

Verify:

  • pytest -q tests/test_ai_provider_wrapper.py

Commit:

  • feat(pipeline): ai provider retry/fallback wrapper

Task 1.3: QA script — scripts/qa_similarity.py

Owner: implementer
Estimate: 2 hours
Depends: none
Description: Add a small script that samples N motions across windows, runs similarity lookup for each sampled motion, asserts simple heuristics (e.g., top-5 are not all score==1.0 except identical IDs), and writes a JSON summary into thoughts/ledgers/qa_similarity_{timestamp}.json. This script is meant to be run manually/CI for a quick QA check.

File: scripts/qa_similarity.py (new)
Test: tests/test_qa_similarity.py (new)

Implementation notes:

  • CLI: --db-path, --sample-size (default 50), --top-k (default 5)
  • Implementation uses MotionDatabase to select a small set of motions and similarity.get_cached_similarities (or MotionDatabase.get_cached_similarities) to evaluate neighbors.
  • The script returns a dict summary which is also written to a uniquely named JSON under thoughts/ledgers/.
  • For tests, monkeypatch MotionDatabase to return deterministic samples and similarities; verify the script produces the expected JSON summary and returns reasonable pass/fail flags.

Verify:

  • pytest -q tests/test_qa_similarity.py
  • Run manually: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10

Commit:

  • feat(scripts): add QA similarity sampling script and ledger writer

Task 1.4: sync_motion_content — reduce concurrency, add per-ext_id retry, add CLI flag

Owner: implementer
Estimate: 3 hours
Depends: 1.1 (write failures to audit via MotionDatabase.append_audit_event)
Description: Harden the body text fetcher:

  • Add CLI flag --max-body-workers (default reduce to 10).
  • Use requests.adapters.HTTPAdapter(pool_connections=10, pool_maxsize=10) when creating the requests.Session in sync_motion_content.
  • Implement per-ext_id retry in _fetch_body_text: try up to 3 times with exponential backoff on network errors/5xx/429.
  • When a body_text fetch permanently fails, call MotionDatabase.append_audit_event(action='body_fetch_failed', target_type='document', target_id=ext_id, metadata=...) so failures are recorded.

File: scripts/sync_motion_content.py (modify)
Test: tests/test_sync_motion_content.py (new)

Implementation details:

  • Add parser.add_argument("--max-body-workers", type=int, default=10, help=...) in CLI
  • When creating session: mount HTTPAdapter with pool_maxsize equal to max_body_workers (requests.adapters.HTTPAdapter(pool_maxsize=...)). Also set session.adapters["https://"] = adapter.
  • Modify _fetch_body_text(ext_id, session) to attempt up to 3 tries and return None on exhaustion; log appropriately; call db.append_audit_event when permanently failing (db from database.db).
  • Update fetch_body_texts to pass max_workers param through as already implemented, but default constant MAX_BODY_WORKERS should be set to 10 at top of file.

Test plan:

  • Test that _fetch_body_text retries: monkeypatch session.get to fail first (raise requests.ConnectionError) and succeed second; verify returned text is successful and that only as many attempts occurred as expected.
  • Test permanent failure case: monkeypatch session.get to always raise and verify MotionDatabase.append_audit_event was called (monkeypatch database.db.append_audit_event).
  • Test fetch_body_texts respects max_workers param by running small set and monkeypatching ThreadPoolExecutor to observe max_workers argument (or call with small size and assert function returns mapped results).

Verify:

  • pytest -q tests/test_sync_motion_content.py
  • Manual run: python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10

Commit:

  • feat(sync): add per-ext_id retries and --max-body-workers flag (defaults to 10), record failures to audit

Batch 2: Core modules (parallel - 3 implementers)

These tasks depend on Batch 1 (ai wrapper and audit method must be present).

Task 2.1: text_pipeline — use ai wrapper & return failed_ids

Owner: implementer
Estimate: 3 hours
Depends: 1.2 (ai_provider_wrapper) and 1.1 (audit)
Description: Modify pipeline/text_pipeline.py to call the new ai_provider_wrapper.get_embeddings_with_retry instead of ai_provider.get_embeddings_batch. Extend ensure_text_embeddings to collect indexes/ids of motions which failed to get embeddings and return them as a fifth element: (stored, skipped_existing, skipped_no_text, errors, failed_ids). Keep logging behavior similar but include a log line reporting failed_ids for the run.

File: pipeline/text_pipeline.py (modify)
Test: tests/test_text_pipeline_retry.py (new)

Implementation details:

  • Replace the ai_provider.get_embeddings_batch(batch_texts, ...) call with wrapper.get_embeddings_with_retry(batch_texts, batch_ids, model=model, batch_size=batch_size, retries=3).
  • The wrapper returns list aligned with batch_texts containing either embedding list or None. For each None, increment errors and append motion_id to failed_ids.
  • At the end of ensure_text_embeddings, return stored, skipped_existing, skipped_no_text, errors, failed_ids.
  • Also ensure docstring updated.
  • Keep existing counting and logging; existing callers will be updated in Task 2.2.

Test plan:

  • Unit test that ensure_text_embeddings:
    • when wrapper returns embeddings for all batch items, stored increments as expected.
    • when wrapper returns None for some items, those motion_ids included in failed_ids and errors counts reflect them.
  • Use monkeypatch to stub pipeline.ai_provider_wrapper.get_embeddings_with_retry and database.db.store_embedding.

Verify:

  • pytest -q tests/test_text_pipeline_retry.py

Commit:

  • feat(pipeline): use ai_provider wrapper for robust embeddings and return failed ids

Task 2.2: rerun_embeddings — add --retry-missing mode and wire re-run

Owner: implementer
Estimate: 2.5 hours
Depends: 2.1 (ensure_text_embeddings new return)
Description: Add a CLI flag --retry-missing to scripts/rerun_embeddings.py. When set, after the main ensure_text_embeddings call, if the returned failed_ids list is non-empty, attempt to re-run embedding for just those failed motion ids using smaller batch_size (e.g., half) via a new helper in text_pipeline (call ensure_text_embeddings with an argument to limit to a provided list OR use a new function text_pipeline.embed_given_ids(...)). To keep changes minimal, call ensure_text_embeddings with a temporary limit and the wrapper can accept a motion_ids argument. The script should record audit events for items that still fail after retry.

File: scripts/rerun_embeddings.py (modify)
Test: tests/test_rerun_embeddings.py (modify — existing test)

Implementation notes:

  • Add parser.add_argument("--retry-missing", action="store_true", help=...).
  • After first ensure_text_embeddings, expect a 5-tuple. If retry_missing and failed_ids exist, call a second short pass: call text_pipeline.get_embeddings_for_ids(db_path=db_path, ids=failed_ids, model=model, batch_size=max(1, batch_size // 2)). Option: reuse ensure_text_embeddings by adding optional parameter to accept a list of motion ids (we added returning failed_ids earlier; modify text_pipeline to accept motion_id list). Implementation choice: add new helper function in text_pipeline called ensure_text_embeddings_for_ids, and use it here.
  • Update tests/test_rerun_embeddings.py to monkeypatch the new text_pipeline helper and simulate that first call returns failed_ids and second call resolves them; assert rerun called accordingly and summary contains expected fields.

Test changes:

  • Update tests/test_rerun_embeddings.py to reflect that text_pipeline.ensure_text_embeddings returns five values and to simulate --retry-missing behavior.
  • Keep the existing expectations in the test (we will extend them to include failed_ids handling).

Verify:

  • pytest -q tests/test_rerun_embeddings.py
  • Manual run: python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing

Commit:

  • feat(scripts): add --retry-missing to rerun_embeddings and retry failed items with smaller batches

Task 2.3: similarity.compute — DB-side safety filter to avoid trivial 1.0 matches

Owner: implementer
Estimate: 3 hours
Depends: none (reads existing DB)
Description: Add a small DB-side filter before inserting similarity rows that filters out suspicious 1.0 matches between different motions when the titles are extremely short (heuristic: identical titles with length < 12 characters). Add diagnostic logging for filtered pairs.

File: similarity/compute.py (modify)
Test: tests/test_similarity_compute_filter.py (new)

Implementation details:

  • After building rows_to_insert (list of dicts with source/target ids & score), perform:
    • If score == 1.0 (or very near 1.0 with tolerance e.g., > 0.999999), fetch titles for the set of involved ids (single query: SELECT id, title FROM motions WHERE id IN (...)).
    • For each candidate row with perfect/near-perfect score, if motion titles are equal and len(title.strip()) < 12, skip insertion and log debug/info that pair was filtered due to trivial short identical title.
  • The threshold 12 chosen conservatively (document in commit).
  • Keep inserted count and return behavior unchanged.
  • Make sure DB connections are opened/closed per method.

Test plan:

  • Construct a minimal in-memory or duckdb-mocked scenario where two different motion ids have identical short title and their vectors produce 1.0 similarity. Monkeypatch duckdb.connect to return rows such that compute_similarities will produce rows_to_insert including a 1.0. Verify store_similarity_batch is not called for that row (monkeypatch MotionDatabase.store_similarity_batch or spy on db.store_similarity_batch calls).

Verify:

  • pytest -q tests/test_similarity_compute_filter.py

Commit:

  • fix(similarity): filter trivial 1.0 matches for very-short identical titles

Batch 3: Observability / Integration (parallel - 1-2 implementers)

These are small finishing tasks (audit/ledgers, small extras).

Task 3.1: Tests & CI adjustments, docs, ledger examples

Owner: reviewer (PR reviewer)
Estimate: 2 hours
Depends: all tasks above (1.1–2.3)
Description: After the code is in, run full test suite, fix any flaky tests, add short README note in thoughts/ledgers/ about how to run QA script and how audit_events fallback works. Add a small example ledger created by QA script if helpful.

Files: (changes/additions)

  • thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md (this plan — created)
  • thoughts/ledgers/README_motion_enrichment.md (new, optional)
  • No dedicated unit test for this task; it's a reviewer/integration task.

Verification:

  • Run full tests: pytest
  • Run QA script locally: python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
  • Inspect thoughts/ledgers/qa_similarity_*.json and audit_events ledger file.

Commit:

  • docs(ledgers): document QA and audit fallback behavior

Test / Verification summary (per-task commands)

  • Task 1.1
    • pytest -q tests/test_database_audit.py
  • Task 1.2
    • pytest -q tests/test_ai_provider_wrapper.py
  • Task 1.3
    • pytest -q tests/test_qa_similarity.py
    • python scripts/qa_similarity.py --db-path data/motions.db --sample-size 10
  • Task 1.4
    • pytest -q tests/test_sync_motion_content.py
    • python scripts/sync_motion_content.py --db-path data/motions.db --max-body-workers 10 --skip-body-text (dry run)
  • Task 2.1
    • pytest -q tests/test_text_pipeline_retry.py
  • Task 2.2
    • pytest -q tests/test_rerun_embeddings.py
    • python scripts/rerun_embeddings.py --db-path data/motions.db --retry-missing
  • Task 2.3
    • pytest -q tests/test_similarity_compute_filter.py

Full suite verification:

  • pytest -q

If you want I can now:

  • generate the apply_patch to create the files and tests described (one patch containing all files), or
  • create the plan file only (this document was requested) — I have it ready at: thoughts/shared/plans/2026-03-23-motion-content-enrichment-plan.md

Which would you like next?