motief/thoughts/shared/designs/2026-03-21-motions-guided-e...

---
date: 2026-03-21
topic: "Reuse motions as a guided policy explorer"
status: draft
---

## Problem Statement

We want to repurpose existing "motions" data so it becomes a lightweight, discovery-driven way for users to explore policy positions and discover related content. This is not a full proposal system; it's a guided exploration and bookmarking flow that leverages our existing ingestion, summarization, embeddings, and session voting work.

**Why now:** We already ingest motions, generate layman explanations, compute embeddings, and store per-session votes. Reusing those building blocks gives high user value with modest effort.

## Constraints

**Non-negotiables and technical limits:**
- Use the existing database schema where possible (motions table, embeddings table, user_sessions). Do not require a new external vector DB for MVP.
- Keep the Streamlit UI model (app.py) and session-based votes intact for the initial rollout.
- Avoid breaking migrations: rely on existing migrations and add new ones when necessary (no forced drops).
- Respect current error-handling posture: network calls can fail; system must degrade gracefully.

## Chosen Approach

I'm choosing a "Guided Policy Explorer" approach because it reuses thehighest-value existing pieces (summaries, embeddings, session voting) and delivers a clear UX that fits the current codebase. This gives immediate product value with low risk.

**Core idea:** present curated short sessions and motion detail pages that combine the existing layman explanation, party-match results, and semantic "related motions" powered by stored embeddings.

Alternatives considered:
- "Motion-as-Proposal platform": full lifecycle (draft → comment → vote). Rejected for MVP due to high complexity and data model changes.
- "Motion Digest / Research Assistant": read-only pages and newsletters. Lower effort, but less interactive and reuses fewer of our current session features.

## Architecture

High-level view (existing pieces in bold):
- Ingest: **api_client.py** + **scraper.py** gather motions and create motion records in the DB.
- Persist: **database.py** stores motions, embeddings, and user_sessions.
- Enrichment: **summarizer.py** + **ai_provider.py** generate layman explanations and embeddings.
- Background jobs: **scheduler.py** runs ingest, summarization, and periodic clustering.
- UI: **app.py** current Streamlit session flow — extend with "Explore" and "Motion detail" pages.
- New: small **clusterer / similarity API** to compute and cache related-motion lists per motion.

## Key Components & Responsibilities

- Motion Ingest (existing): keep ingest as-is; add metadata flags (e.g., curated, candidate).
- Motion Store (existing): motions table + embeddings table; add an **events/audit** table for user actions and important state transitions.
- Summarizer / Embedding Worker (existing): scheduled job that ensures motions have layman_explanation and embeddings; add retry/backoff and logging.
- Similarity service (new): computes nearest neighbors using stored vectors in-process for MVP and caches results in a small table. Swap to a vector index later if needed.
- Session & Voting (existing): continue using user_sessions JSON blob for individual sessions; add optional event log entries for each vote.
- UI (update): add "Explore" landing, motion detail view with layman text, party-match snapshot, related motions, and bookmark/flag actions. Reuse Streamlit components.
- Admin tooling (new): migration scripts, a CLI to recompute embeddings/similarity, and an audit query helper.

## Data Flow

1. Ingest job (api_client/scraper) produces motion records and calls db.insert_motion.
2. Summarizer worker picks up motions without layman_explanation or embeddings, calls ai_provider, and writes layman_explanation + embeddings.
3. Clusterer/similarity job computes related-motion lists using stored embeddings and writes them to a cache table.
4. UI "Explore" shows curated motion lists; "Motion detail" reads motion, layman_explanation, party-match snapshot, and cached related motions.
5. User vote actions update user_sessions and also append an event to the audit table for traceability.
6. Background analytics (optional) reuses user_events and embeddings for offline insights.

## Error Handling Strategy

- External calls: add retries with exponential backoff for AI provider and external APIs. Failures set a marker (e.g., summary_missing) and the system continues.
- Missing embeddings: UI gracefully disables "related motions" and offers "compute on demand".
- Idempotency: make insert_motion idempotent by URL/external id check at DB layer; use optimistic handling for duplicates.
- Concurrency: avoid read-modify-write races by writing user events (append-only) and deriving session state from events when race-prone updates are detected.
- Observability: replace prints with structured logging (module-level logger) and add basic metrics for worker errors, API failures, and queue lags.

## Testing Strategy

- Unit tests: DB helpers (insert_motion, store_embedding, similarity cache), summarizer functions (mock ai_provider), and session vote logic.
- Migration tests: follow the existing pattern of applying migration SQL in a temp DB and asserting schema.
- Integration tests: end-to-end ingest → summarize → embedding → similarity → UI-read path in CI (use monkeypatch for AI calls).
- Load tests: simulate a few thousand embeddings search calls against the in-process search to validate performance assumptions for MVP.
- Acceptance: confirm UX flows: Explore session, Motion detail, Vote -> party match, Related motions populated.

## High-level Plan & Estimates

Assumptions: one full-stack engineer (Python + Streamlit) and one part-time reviewer. All estimates are rough.

Milestone 0 — Validate & quick discovery (1 day)
- Locate user's added markdown plan and extract exact requirements. (I'm assuming the file exists in thoughts/shared; if not, we validated by searching.)

Milestone 1 — MVP (8–12 engineer days)
- Add similarity cache table and migration.
- Summarizer: make embedding generation robust with retries and store vectors.
- Clusterer job: compute and cache related motions.
- UI: Explore landing, Motion detail page, related motion UI, bookmark/flag button.
- Add event/audit table and write events on user votes and bookmarks.

Milestone 2 — Hardening & instrumentation (3–5 engineer days)
- Replace prints with structured logging across touched modules.
- Add migration tests and CI integration tests (mock AI).
- Add health metrics & basic alerting for worker failures.

Milestone 3 — Polish & UX feedback (3–5 engineer days)
- UX tweaks, performance tuning, compute on-demand fallback for embeddings, documentation, admin CLI.

Total MVP + polish: ~2–3 weeks of focused work.

## Risks & Mitigations

- Risk: Naive in-process embedding search will not scale. Mitigation: cache nearest neighbors per motion and plan a migration path to a vector index.
- Risk: AI provider flakiness. Mitigation: retries, timeouts, and clear UI fallback. Tests must mock provider in CI.
- Risk: Race conditions on session votes. Mitigation: append-only event log and derive authoritative session view from events when needed.
- Risk: Schema drift and missing migrations. Mitigation: add migration tests and document required migrations in repo.

## Open Questions

- Which exact user journeys do we want first (single-session discover vs. persistent account/bookmarking)?
- Do we want bookmarks persisted globally or per-session only? (Privacy implications.)
- What's acceptable latency for "related motions" — precomputed nightly vs. near-real-time?
- Any policy/legal ban on storing full body_text or on long-term retention of user votes?

---

I'm proceeding to create the design doc file at thoughts/shared/designs/2026-03-21-motions-guided-explorer-design.md and will spawn the implementation planner next. Interrupt if you want changes to the approach or scope now.