You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-03-21-motions-guided-e...

7.9 KiB

date topic status
2026-03-21 Reuse motions as a guided policy explorer draft

Problem Statement

We want to repurpose existing "motions" data so it becomes a lightweight, discovery-driven way for users to explore policy positions and discover related content. This is not a full proposal system; it's a guided exploration and bookmarking flow that leverages our existing ingestion, summarization, embeddings, and session voting work.

Why now: We already ingest motions, generate layman explanations, compute embeddings, and store per-session votes. Reusing those building blocks gives high user value with modest effort.

Constraints

Non-negotiables and technical limits:

  • Use the existing database schema where possible (motions table, embeddings table, user_sessions). Do not require a new external vector DB for MVP.
  • Keep the Streamlit UI model (app.py) and session-based votes intact for the initial rollout.
  • Avoid breaking migrations: rely on existing migrations and add new ones when necessary (no forced drops).
  • Respect current error-handling posture: network calls can fail; system must degrade gracefully.

Chosen Approach

I'm choosing a "Guided Policy Explorer" approach because it reuses thehighest-value existing pieces (summaries, embeddings, session voting) and delivers a clear UX that fits the current codebase. This gives immediate product value with low risk.

Core idea: present curated short sessions and motion detail pages that combine the existing layman explanation, party-match results, and semantic "related motions" powered by stored embeddings.

Alternatives considered:

  • "Motion-as-Proposal platform": full lifecycle (draft → comment → vote). Rejected for MVP due to high complexity and data model changes.
  • "Motion Digest / Research Assistant": read-only pages and newsletters. Lower effort, but less interactive and reuses fewer of our current session features.

Architecture

High-level view (existing pieces in bold):

  • Ingest: api_client.py + scraper.py gather motions and create motion records in the DB.
  • Persist: database.py stores motions, embeddings, and user_sessions.
  • Enrichment: summarizer.py + ai_provider.py generate layman explanations and embeddings.
  • Background jobs: scheduler.py runs ingest, summarization, and periodic clustering.
  • UI: app.py current Streamlit session flow — extend with "Explore" and "Motion detail" pages.
  • New: small clusterer / similarity API to compute and cache related-motion lists per motion.

Key Components & Responsibilities

  • Motion Ingest (existing): keep ingest as-is; add metadata flags (e.g., curated, candidate).
  • Motion Store (existing): motions table + embeddings table; add an events/audit table for user actions and important state transitions.
  • Summarizer / Embedding Worker (existing): scheduled job that ensures motions have layman_explanation and embeddings; add retry/backoff and logging.
  • Similarity service (new): computes nearest neighbors using stored vectors in-process for MVP and caches results in a small table. Swap to a vector index later if needed.
  • Session & Voting (existing): continue using user_sessions JSON blob for individual sessions; add optional event log entries for each vote.
  • UI (update): add "Explore" landing, motion detail view with layman text, party-match snapshot, related motions, and bookmark/flag actions. Reuse Streamlit components.
  • Admin tooling (new): migration scripts, a CLI to recompute embeddings/similarity, and an audit query helper.

Data Flow

  1. Ingest job (api_client/scraper) produces motion records and calls db.insert_motion.
  2. Summarizer worker picks up motions without layman_explanation or embeddings, calls ai_provider, and writes layman_explanation + embeddings.
  3. Clusterer/similarity job computes related-motion lists using stored embeddings and writes them to a cache table.
  4. UI "Explore" shows curated motion lists; "Motion detail" reads motion, layman_explanation, party-match snapshot, and cached related motions.
  5. User vote actions update user_sessions and also append an event to the audit table for traceability.
  6. Background analytics (optional) reuses user_events and embeddings for offline insights.

Error Handling Strategy

  • External calls: add retries with exponential backoff for AI provider and external APIs. Failures set a marker (e.g., summary_missing) and the system continues.
  • Missing embeddings: UI gracefully disables "related motions" and offers "compute on demand".
  • Idempotency: make insert_motion idempotent by URL/external id check at DB layer; use optimistic handling for duplicates.
  • Concurrency: avoid read-modify-write races by writing user events (append-only) and deriving session state from events when race-prone updates are detected.
  • Observability: replace prints with structured logging (module-level logger) and add basic metrics for worker errors, API failures, and queue lags.

Testing Strategy

  • Unit tests: DB helpers (insert_motion, store_embedding, similarity cache), summarizer functions (mock ai_provider), and session vote logic.
  • Migration tests: follow the existing pattern of applying migration SQL in a temp DB and asserting schema.
  • Integration tests: end-to-end ingest → summarize → embedding → similarity → UI-read path in CI (use monkeypatch for AI calls).
  • Load tests: simulate a few thousand embeddings search calls against the in-process search to validate performance assumptions for MVP.
  • Acceptance: confirm UX flows: Explore session, Motion detail, Vote -> party match, Related motions populated.

High-level Plan & Estimates

Assumptions: one full-stack engineer (Python + Streamlit) and one part-time reviewer. All estimates are rough.

Milestone 0 — Validate & quick discovery (1 day)

  • Locate user's added markdown plan and extract exact requirements. (I'm assuming the file exists in thoughts/shared; if not, we validated by searching.)

Milestone 1 — MVP (8–12 engineer days)

  • Add similarity cache table and migration.
  • Summarizer: make embedding generation robust with retries and store vectors.
  • Clusterer job: compute and cache related motions.
  • UI: Explore landing, Motion detail page, related motion UI, bookmark/flag button.
  • Add event/audit table and write events on user votes and bookmarks.

Milestone 2 — Hardening & instrumentation (3–5 engineer days)

  • Replace prints with structured logging across touched modules.
  • Add migration tests and CI integration tests (mock AI).
  • Add health metrics & basic alerting for worker failures.

Milestone 3 — Polish & UX feedback (3–5 engineer days)

  • UX tweaks, performance tuning, compute on-demand fallback for embeddings, documentation, admin CLI.

Total MVP + polish: ~2–3 weeks of focused work.

Risks & Mitigations

  • Risk: Naive in-process embedding search will not scale. Mitigation: cache nearest neighbors per motion and plan a migration path to a vector index.
  • Risk: AI provider flakiness. Mitigation: retries, timeouts, and clear UI fallback. Tests must mock provider in CI.
  • Risk: Race conditions on session votes. Mitigation: append-only event log and derive authoritative session view from events when needed.
  • Risk: Schema drift and missing migrations. Mitigation: add migration tests and document required migrations in repo.

Open Questions

  • Which exact user journeys do we want first (single-session discover vs. persistent account/bookmarking)?
  • Do we want bookmarks persisted globally or per-session only? (Privacy implications.)
  • What's acceptable latency for "related motions" — precomputed nightly vs. near-real-time?
  • Any policy/legal ban on storing full body_text or on long-term retention of user votes?

I'm proceeding to create the design doc file at thoughts/shared/designs/2026-03-21-motions-guided-explorer-design.md and will spawn the implementation planner next. Interrupt if you want changes to the approach or scope now.