18 KiB

Raw Blame History

Guided Policy Explorer — Implementation Plan

Goal: Implement the Guided Policy Explorer MVP that reuses existing motions, layman summaries, embeddings and session votes to provide an Explore landing, Motion detail view, cached related motions (similarity cache), and accompanying background jobs and admin tooling.

Design: thoughts/shared/designs/2026-03-21-motions-guided-explorer-design.md

Dependency Graph

Batch 1 (parallel): 1.1, 1.2, 1.3, 1.4, 1.5 [foundation - migrations, types, migration-tests]
Batch 2 (parallel): 2.1, 2.2, 2.3, 2.4 [core - similarity service, cache repo, audit repo, embeddings worker]
Batch 3 (parallel): 3.1, 3.2, 3.3, 3.4 [components - clusterer worker, CLI, API, Streamlit page]
Batch 4 (parallel): 4.1 [integration tests & docs - depends on 2.x & 3.x]

Notes on planning choices

Design requires a similarity cache and a small in-process nearest-neighbor search for MVP. I'm implementing this as: store precomputed top-N neighbor lists (IDs + scores) in a small SQL table and compute neighbors by scanning embeddings in-memory per batch job. Reason: avoids external vector DB and keeps implementation simple and testable.
Design requires robust embedding generation. I'll implement exponential-backoff retry logic with a configurable retry count and timeouts in embeddings_worker; tests will monkeypatch the ai_provider to simulate failures.
Migration tests: design asks to have migration tests, but migration SQL content is omitted per instructions. Tests will assert that migration files are present and follow naming conventions and will be marked to skip applying SQL unless a TEST_DB_URL env var is provided. This keeps CI safe while satisfying test coverage and developer verification.

Batch 1: Foundation (parallel - 5 implementers)

All tasks in this batch have NO dependencies and run simultaneously.

Task 1.1: Add similarity cache migration (placeholder)

Title: Migration: add similarity_cache table Description: Add a migration file to create a similarity cache table that stores precomputed related-motion lists per motion (motion_id, neighbors_json, computed_at). SQL content intentionally left out per instructions; file is a placeholder that CI/tests will detect. Files:

migrations/2026-03-22-add-similarity-cache.sql Tests:
tests/migrations/test_2026_03_22_add_similarity_cache.py Estimated: 1.0h Priority: high Depends: none Acceptance criteria:
Migration file exists at migrations/2026-03-22-add-similarity-cache.sql
test_migration file runs and passes in default mode (it will only check filename & header). If TEST_DB_URL is set in env, test will attempt to run the SQL and must not error (SQL may be empty; test expects a no-op or valid SQL). Test is marked to skip DB application when TEST_DB_URL is unset.

Task 1.2: Add audit/events migration (placeholder)

Title: Migration: add audit_events table Description: Add a migration placeholder to create an audit/events table for append-only user events (vote, bookmark, flag). Actual SQL omitted. Files:

migrations/2026-03-22-add-audit-events.sql Tests:
tests/migrations/test_2026_03_22_add_audit_events.py Estimated: 1.0h Priority: high Depends: none Acceptance criteria:
migrations/2026-03-22-add-audit-events.sql exists
migration test verifies filename and is safe to run in CI (skips DB apply unless TEST_DB_URL provided).

Task 1.3: Shared types for motions & similarity entries

Title: Types: motion and similarity types Description: Add a small types module that centralizes typed dataclasses/interfaces used by similarity and cache modules (MotionId, Embedding vector typed alias, SimilarityNeighbor). This reduces coupling and makes tests easier to write. Files:

src/types/motion_types.py Tests:
tests/types/test_motion_types.py Estimated: 1.5h Priority: medium Depends: none Acceptance criteria:
src/types/motion_types.py defines MotionId, Embedding, SimilarityNeighbor types and basic helpers (e.g., serialize/deserialize neighbors). Tests validate JSON round-trip of neighbors.

Task 1.4: CI migration test helper

Title: Test helper: migration test utils Description: Add a small test helper that other migration tests can use. It provides a pytest fixture that reads TEST_DB_URL and yields a DB connection or None and marks tests appropriately. Files:

tests/utils/migration_fixtures.py Tests:
tests/migrations/test_migration_fixtures_smoke.py Estimated: 1.0h Priority: medium Depends: none Acceptance criteria:
migration_fixtures.py provides test_db fixture. The smoke test asserts fixture yields None when TEST_DB_URL unset and yields a connection-like object when set.

Task 1.5: Add README admin docs for recomputing

Title: Docs: admin CLI usage and migration notes Description: Add a short markdown doc describing the admin CLI, migration filenames, and how to run recompute/clusterer jobs locally for dev. Files:

docs/admin/recompute_similarity.md Tests: none (doc only) Estimated: 0.5h Priority: low Depends: none Acceptance criteria:
docs/admin/recompute_similarity.md exists and documents commands and env vars: TEST_DB_URL, AI_PROVIDER_MOCK, SIMILARITY_TOP_N.

Batch 2: Core Modules (parallel - 4 implementers)

Depends: Batch 1

Task 2.1: Similarity service (in-process search + utility)

Title: Similarity service implementation Description: New service that, given motion embeddings, computes cosine similarity and returns top-N neighbors. Also exposes a convenience function to compute neighbors for one motion and return a list of (motion_id, score). This is pure Python and testable in-memory. Files:

src/services/similarity_service.py Tests:
tests/services/test_similarity_service.py Estimated: 5.0h Priority: high Depends: 1.3 Acceptance criteria:
similarity_service.py exposes compute_neighbors(embedding: list[float], all_embeddings: Dict[motion_id, embedding], top_n: int) -> List[SimilarityNeighbor]
Unit tests cover exact small matrices and edge cases (empty, identical embeddings). All tests pass with pytest tests/services/test_similarity_service.py.

Task 2.2: DB repo for similarity cache

Title: Repo: similarity_cache read/write Description: Provide a small repository abstraction that reads and writes cached neighbor lists to the DB (serialize neighbors as JSON). Keep DB interactions minimal and testable using sqlite in-memory. Files:

src/db/similarity_cache_repo.py Tests:
tests/db/test_similarity_cache_repo.py Estimated: 4.0h Priority: high Depends: 1.1, 1.3 Acceptance criteria:
similarity_cache_repo provides functions: get_cached_neighbors(motion_id) -> Optional[List[SimilarityNeighbor]] and upsert_cached_neighbors(motion_id, neighbors, computed_at)
Unit tests run against sqlite in-memory and assert correct serialization/deserialization.

Task 2.3: Audit/events repository

Title: Repo: audit_events append-only writer Description: Small repo to append audit events (user_id, session_id, motion_id, event_type, payload JSON, created_at). Provides an append_event function used by UI and session logic. Files:

src/db/audit_repo.py Tests:
tests/db/test_audit_repo.py Estimated: 3.0h Priority: medium Depends: 1.2 Acceptance criteria:
append_event writes a row to sqlite in-memory in test and read-back verifies fields and created_at presence. Functions are well typed and handle JSON payloads.

Task 2.4: Embeddings worker helper (retries/backoff)

Title: Worker: robust embedding generator Description: Add a worker helper that ensures embeddings exist for a motion. It calls ai_provider.get_embedding with retry/backoff and writes embedding via an abstracted DB function (the put function will be dependency-injected in tests). This module contains no long-running loop — it's a single-run helper function used by the scheduler. Files:

src/ai/embeddings_worker.py Tests:
tests/ai/test_embeddings_worker.py Estimated: 4.0h Priority: high Depends: 1.3 Acceptance criteria:
embeddings_worker.explain_and_embed(motion_id, text, put_embedding_fn) calls ai_provider and retries on simulated transient errors. Tests monkeypatch ai_provider to simulate 2 failing attempts then success and verify put_embedding_fn called exactly once with a vector-like object.

Batch 3: Components (parallel - 4 implementers)

Depends: Batch 2

Task 3.1: Clusterer scheduled job

Title: Worker: clusterer job that computes & writes caches Description: Background job module that loads all embeddings, computes top-N neighbors for each motion using similarity_service, and writes cache rows via similarity_cache_repo. Designed to be runnable from CLI. It should respect a MAX runtime parameter (process batch size) for safe operation in dev. Files:

src/workers/clusterer.py Tests:
tests/workers/test_clusterer.py Estimated: 6.0h Priority: high Depends: 2.1, 2.2, 2.4 Acceptance criteria:
clusterer.run_batch(batch_size, top_n, load_embeddings_fn, upsert_cache_fn) exists and can be unit-tested by injecting small in-memory embeddings and verifying upsert_cache_fn called with expected neighbor lists.

Task 3.2: Admin CLI: recompute-similarity

Title: CLI: recompute similarity & options Description: Small CLI script (click or argparse) to trigger the clusterer job (full-run or limited). CLI accepts --top-n, --batch-size, --dry-run flags. Tests will monkeypatch clusterer.run_batch. Files:

src/cli/recompute_similarity.py Tests:
tests/cli/test_recompute_similarity.py Estimated: 2.5h Priority: medium Depends: 3.1 Acceptance criteria:
CLI parses flags and calls clusterer.run_batch with parsed args. tests assert proper arguments passed and dry-run does not call run_batch.

Task 3.3: HTTP API endpoint for compute-on-demand / cached

Title: API: similarity endpoint Description: Small Flask/FastAPI/WSGI handler module that returns cached related motions for a motion_id; if cache missing and a query param compute=true, it calls the similarity service to compute neighbors on demand (without persisting) and returns them. Keep the handler framework-agnostic so it can be wired into existing web framework; tests will call the handler function directly. Files:

src/api/similarity_api.py Tests:
tests/api/test_similarity_api.py Estimated: 3.5h Priority: medium Depends: 2.1, 2.2 Acceptance criteria:
Handler get_related(motion_id, compute=False, load_embedding_fn, load_all_embeddings_fn, cache_repo) returns cached neighbors when present and computes on demand when compute=True. Tests cover both code paths.

Task 3.4: Streamlit UI: Explore landing & Motion detail module

Title: UI: explore page and motion detail component Description: Add a Streamlit helper module providing functions to render the Explore landing and Motion detail sections. Avoid modifying existing app.py in this MVP; instead provide a module that app.py can import. The module will expose pure functions where possible to ease testing; tests will verify behavior by calling functions and mocking DB/AI calls. Files:

src/ui/explore_page.py Tests:
tests/ui/test_explore_page.py Estimated: 5.0h Priority: medium Depends: 2.2, 2.3, 2.4 Acceptance criteria:
explore_page.render_explore(session, load_curated_fn, load_cached_neighbors_fn) returns a data structure (not direct Streamlit calls) that app.py can choose to render. Tests assert correct payload for a sample session and that missing embeddings gracefully remove related motions.

Batch 4: Integration & Docs (parallel - 2 implementers)

Depends: Batch 2 & 3

Task 4.1: Integration test: ingest → summarize → embed → cluster → UI read

Title: Integration test for the end-to-end path (mvp) Description: Add an integration pytest that simulates: create 3 synthetic motions, call embeddings_worker (monkeypatched AI provider), run clusterer on the in-memory dataset, and assert similarity cache rows exist and explore_page returns related motions. Use sqlite in-memory and monkeypatch ai_provider to return deterministic vectors. Files:

tests/integration/test_end_to_end_explore_flow.py Tests:
(this is the test file) Estimated: 8.0h Priority: high Depends: 1.3, 2.1, 2.2, 2.4, 3.1, 3.4 Acceptance criteria:
Running pytest tests/integration/test_end_to_end_explore_flow.py passes locally with no external network calls when AI provider is monkeypatched via monkeypatch fixture. The test asserts that at least one neighbor exists for a motion and the explore_page data includes it.

CI / Test instructions

Run unit tests: pytest tests/unit (or full suite: pytest)
Run a single module test: pytest tests/services/test_similarity_service.py::test_compute_neighbors_basic
Integration tests: pytest tests/integration/test_end_to_end_explore_flow.py

Monkeypatching AI provider in CI/local tests:

Use the monkeypatch pytest fixture to patch src.ai.ai_provider.get_embedding and src.ai.ai_provider.summarize (if used). Example in tests: monkeypatch.setattr('src.ai.ai_provider.get_embedding', fake_get_embedding)
CI should set env var AI_PROVIDER_MOCK=1 for additional safety; tests will check this var and use mocks if present.

Temp DB setup for tests:

Unit tests should use sqlite in-memory ("sqlite:///:memory:") via a test_db fixture in tests/utils/migration_fixtures.py.
Migration tests: If TEST_DB_URL env var is set, the migration tests will attempt to apply SQL to that DB; otherwise they will run in dry-run / skip-apply mode and only validate filename and header.

Example pytest commands:

pytest -q
pytest -q tests/services/test_similarity_service.py -k compute_neighbors

Notes for CI pipeline:

Ensure Python dependencies include pytest, pytest-mock and any DB driver required (sqlite built-in is fine). No external AI keys required — tests must mock AI provider.

3-Sprint Schedule (2-week sprints)

Sprint 1 (Weeks 1–2) — Milestone 1: MVP foundation + core similarity

Goals: Add migrations, types, similarity service, similarity cache repo, audit repo, embeddings worker helper
Tasks: 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, 2.4

Sprint 2 (Weeks 3–4) — Milestone 1 continued: background job, CLI, API, UI

Goals: Implement clusterer job, CLI, similarity API, explore_page UI module; initial integration smoke tests
Tasks: 3.1, 3.2, 3.3, 3.4, initial lightweight integration test scaffolding

Sprint 3 (Weeks 5–6) — Milestone 2 & 3: hardening, integration tests, docs

Goals: Full integration tests, migration tests, docs, logging hardening, small UX polish
Tasks: 4.1, docs improvements from 1.5, logging conversion across modules (follow-up small PRs as needed)

Notes:

Estimates assume 1 full-stack engineer + 1 reviewer. Sprint 1 is AMA-heavy; reviewer will focus on migrations and core algorithms. Sprint 2 focuses on wiring and UI; reviewer focuses on integration and UX. Sprint 3 finishes tests and polish.

Assumptions

The repository uses Python 3.10+ and pytest for tests. If different, adjust test fixtures accordingly.
Existing DB access helpers exist (a simple execute/connection helper). If not, tests use sqlite3 directly and repository code will accept a DB connection/cursor via dependency injection.
The project already has an ai_provider abstraction at src/ai/ai_provider.py with functions get_embedding(text) -> list[float] and summarize(text) -> str — tests will monkeypatch these. If the names differ, adapt imports when implementing.
Streamlit app remains app.py and can import src/ui/explore_page.py — I deliberately do not modify app.py in this plan to keep the change set minimal.
We will store embeddings as arrays in an embeddings table; similarity modules will load them via an injected loader function to keep unit tests pure.

Open Questions / Implementation Clarifications

Bookmarks persistence: design left bookmarks as open (session vs. persistent). For MVP we will record bookmark events in the audit_events table (append-only) and treat them as per-session by default. If persistent bookmarks required later, a new table/migration will be added.
Which web framework to wire the similarity_api into? The plan keeps handler framework-agnostic; we need guidance whether app uses Flask/FastAPI/Starlette to add the route. Implementer should wire into existing HTTP routing pattern.
Embedding storage format: assume float arrays stored as JSON or array type in DB. If project uses a binary blob, adjust serialization in similarity_cache_repo and tests accordingly.
Acceptable top-N neighbor size for caches. Default SIMILARITY_TOP_N = 10; CLI and worker accept override. If product wants 50, increase later.

How a single implementer should proceed (step-by-step)

Start with Batch 1 tasks 1.1–1.4. Create migrations placeholders and types module. Run migration filename tests.
Implement similarity_service (2.1) and its unit tests. This is the critical algorithm that must be rock-solid.
Implement similarity_cache_repo (2.2) and audit_repo (2.3) using sqlite in-memory for tests. Run unit tests.
Implement embeddings_worker helper (2.4) and add tests that mock ai_provider. Ensure CI will not call real AI.
Implement clusterer (3.1) and test with in-memory data by injecting loader/upsert functions.
Add admin CLI (3.2) to run clusterer; add small doc (1.5) describing how to run it locally.
Implement API handler (3.3) and UI helper (3.4). Tests should mock DB and AI as needed.
Finish with integration test (4.1) to stitch the pieces together. Iterate on bug fixes and reviewer feedback.

Acceptance criteria for the feature (MVP)

Explore landing exists and can present curated motions (using existing curated flag). Data payload returned by explore_page includes motion metadata and layman_explanation.
Motion detail returns layman_explanation, party-match snapshot (existing), and related motions computed from cached neighbor lists when available.
Background clusterer job can recompute cached neighbor lists and the CLI can trigger it.
Tests cover core algorithm (similarity computation), cache repo serialization, embedders (mocked), and at least one end-to-end smoke integration test.

If anything in this plan should be narrowed further (for a smaller initial PR) I recommend focusing Sprint 1 + clusterer CLI (Tasks 1.x + 2.x + 3.1 + 3.2) and deferring UI wiring until clusterer and cache are validated.

18 KiB Raw Blame History