--- date: 2026-03-21 topic: "Reuse motions as a guided policy explorer" status: draft --- ## Problem Statement We want to repurpose existing "motions" data so it becomes a lightweight, discovery-driven way for users to explore policy positions and discover related content. This is not a full proposal system; it's a guided exploration and bookmarking flow that leverages our existing ingestion, summarization, embeddings, and session voting work. **Why now:** We already ingest motions, generate layman explanations, compute embeddings, and store per-session votes. Reusing those building blocks gives high user value with modest effort. ## Constraints **Non-negotiables and technical limits:** - Use the existing database schema where possible (motions table, embeddings table, user_sessions). Do not require a new external vector DB for MVP. - Keep the Streamlit UI model (app.py) and session-based votes intact for the initial rollout. - Avoid breaking migrations: rely on existing migrations and add new ones when necessary (no forced drops). - Respect current error-handling posture: network calls can fail; system must degrade gracefully. ## Chosen Approach I'm choosing a "Guided Policy Explorer" approach because it reuses thehighest-value existing pieces (summaries, embeddings, session voting) and delivers a clear UX that fits the current codebase. This gives immediate product value with low risk. **Core idea:** present curated short sessions and motion detail pages that combine the existing layman explanation, party-match results, and semantic "related motions" powered by stored embeddings. Alternatives considered: - "Motion-as-Proposal platform": full lifecycle (draft → comment → vote). Rejected for MVP due to high complexity and data model changes. - "Motion Digest / Research Assistant": read-only pages and newsletters. Lower effort, but less interactive and reuses fewer of our current session features. ## Architecture High-level view (existing pieces in bold): - Ingest: **api_client.py** + **scraper.py** gather motions and create motion records in the DB. - Persist: **database.py** stores motions, embeddings, and user_sessions. - Enrichment: **summarizer.py** + **ai_provider.py** generate layman explanations and embeddings. - Background jobs: **scheduler.py** runs ingest, summarization, and periodic clustering. - UI: **app.py** current Streamlit session flow — extend with "Explore" and "Motion detail" pages. - New: small **clusterer / similarity API** to compute and cache related-motion lists per motion. ## Key Components & Responsibilities - Motion Ingest (existing): keep ingest as-is; add metadata flags (e.g., curated, candidate). - Motion Store (existing): motions table + embeddings table; add an **events/audit** table for user actions and important state transitions. - Summarizer / Embedding Worker (existing): scheduled job that ensures motions have layman_explanation and embeddings; add retry/backoff and logging. - Similarity service (new): computes nearest neighbors using stored vectors in-process for MVP and caches results in a small table. Swap to a vector index later if needed. - Session & Voting (existing): continue using user_sessions JSON blob for individual sessions; add optional event log entries for each vote. - UI (update): add "Explore" landing, motion detail view with layman text, party-match snapshot, related motions, and bookmark/flag actions. Reuse Streamlit components. - Admin tooling (new): migration scripts, a CLI to recompute embeddings/similarity, and an audit query helper. ## Data Flow 1. Ingest job (api_client/scraper) produces motion records and calls db.insert_motion. 2. Summarizer worker picks up motions without layman_explanation or embeddings, calls ai_provider, and writes layman_explanation + embeddings. 3. Clusterer/similarity job computes related-motion lists using stored embeddings and writes them to a cache table. 4. UI "Explore" shows curated motion lists; "Motion detail" reads motion, layman_explanation, party-match snapshot, and cached related motions. 5. User vote actions update user_sessions and also append an event to the audit table for traceability. 6. Background analytics (optional) reuses user_events and embeddings for offline insights. ## Error Handling Strategy - External calls: add retries with exponential backoff for AI provider and external APIs. Failures set a marker (e.g., summary_missing) and the system continues. - Missing embeddings: UI gracefully disables "related motions" and offers "compute on demand". - Idempotency: make insert_motion idempotent by URL/external id check at DB layer; use optimistic handling for duplicates. - Concurrency: avoid read-modify-write races by writing user events (append-only) and deriving session state from events when race-prone updates are detected. - Observability: replace prints with structured logging (module-level logger) and add basic metrics for worker errors, API failures, and queue lags. ## Testing Strategy - Unit tests: DB helpers (insert_motion, store_embedding, similarity cache), summarizer functions (mock ai_provider), and session vote logic. - Migration tests: follow the existing pattern of applying migration SQL in a temp DB and asserting schema. - Integration tests: end-to-end ingest → summarize → embedding → similarity → UI-read path in CI (use monkeypatch for AI calls). - Load tests: simulate a few thousand embeddings search calls against the in-process search to validate performance assumptions for MVP. - Acceptance: confirm UX flows: Explore session, Motion detail, Vote -> party match, Related motions populated. ## High-level Plan & Estimates Assumptions: one full-stack engineer (Python + Streamlit) and one part-time reviewer. All estimates are rough. Milestone 0 — Validate & quick discovery (1 day) - Locate user's added markdown plan and extract exact requirements. (I'm assuming the file exists in thoughts/shared; if not, we validated by searching.) Milestone 1 — MVP (8–12 engineer days) - Add similarity cache table and migration. - Summarizer: make embedding generation robust with retries and store vectors. - Clusterer job: compute and cache related motions. - UI: Explore landing, Motion detail page, related motion UI, bookmark/flag button. - Add event/audit table and write events on user votes and bookmarks. Milestone 2 — Hardening & instrumentation (3–5 engineer days) - Replace prints with structured logging across touched modules. - Add migration tests and CI integration tests (mock AI). - Add health metrics & basic alerting for worker failures. Milestone 3 — Polish & UX feedback (3–5 engineer days) - UX tweaks, performance tuning, compute on-demand fallback for embeddings, documentation, admin CLI. Total MVP + polish: ~2–3 weeks of focused work. ## Risks & Mitigations - Risk: Naive in-process embedding search will not scale. Mitigation: cache nearest neighbors per motion and plan a migration path to a vector index. - Risk: AI provider flakiness. Mitigation: retries, timeouts, and clear UI fallback. Tests must mock provider in CI. - Risk: Race conditions on session votes. Mitigation: append-only event log and derive authoritative session view from events when needed. - Risk: Schema drift and missing migrations. Mitigation: add migration tests and document required migrations in repo. ## Open Questions - Which exact user journeys do we want first (single-session discover vs. persistent account/bookmarking)? - Do we want bookmarks persisted globally or per-session only? (Privacy implications.) - What's acceptable latency for "related motions" — precomputed nightly vs. near-real-time? - Any policy/legal ban on storing full body_text or on long-term retention of user votes? --- I'm proceeding to create the design doc file at thoughts/shared/designs/2026-03-21-motions-guided-explorer-design.md and will spawn the implementation planner next. Interrupt if you want changes to the approach or scope now.