You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
116 lines
7.9 KiB
116 lines
7.9 KiB
---
|
|
date: 2026-03-21
|
|
topic: "Reuse motions as a guided policy explorer"
|
|
status: draft
|
|
---
|
|
|
|
## Problem Statement
|
|
|
|
We want to repurpose existing "motions" data so it becomes a lightweight, discovery-driven way for users to explore policy positions and discover related content. This is not a full proposal system; it's a guided exploration and bookmarking flow that leverages our existing ingestion, summarization, embeddings, and session voting work.
|
|
|
|
**Why now:** We already ingest motions, generate layman explanations, compute embeddings, and store per-session votes. Reusing those building blocks gives high user value with modest effort.
|
|
|
|
## Constraints
|
|
|
|
**Non-negotiables and technical limits:**
|
|
- Use the existing database schema where possible (motions table, embeddings table, user_sessions). Do not require a new external vector DB for MVP.
|
|
- Keep the Streamlit UI model (app.py) and session-based votes intact for the initial rollout.
|
|
- Avoid breaking migrations: rely on existing migrations and add new ones when necessary (no forced drops).
|
|
- Respect current error-handling posture: network calls can fail; system must degrade gracefully.
|
|
|
|
## Chosen Approach
|
|
|
|
I'm choosing a "Guided Policy Explorer" approach because it reuses thehighest-value existing pieces (summaries, embeddings, session voting) and delivers a clear UX that fits the current codebase. This gives immediate product value with low risk.
|
|
|
|
**Core idea:** present curated short sessions and motion detail pages that combine the existing layman explanation, party-match results, and semantic "related motions" powered by stored embeddings.
|
|
|
|
Alternatives considered:
|
|
- "Motion-as-Proposal platform": full lifecycle (draft → comment → vote). Rejected for MVP due to high complexity and data model changes.
|
|
- "Motion Digest / Research Assistant": read-only pages and newsletters. Lower effort, but less interactive and reuses fewer of our current session features.
|
|
|
|
## Architecture
|
|
|
|
High-level view (existing pieces in bold):
|
|
- Ingest: **api_client.py** + **scraper.py** gather motions and create motion records in the DB.
|
|
- Persist: **database.py** stores motions, embeddings, and user_sessions.
|
|
- Enrichment: **summarizer.py** + **ai_provider.py** generate layman explanations and embeddings.
|
|
- Background jobs: **scheduler.py** runs ingest, summarization, and periodic clustering.
|
|
- UI: **app.py** current Streamlit session flow — extend with "Explore" and "Motion detail" pages.
|
|
- New: small **clusterer / similarity API** to compute and cache related-motion lists per motion.
|
|
|
|
## Key Components & Responsibilities
|
|
|
|
- Motion Ingest (existing): keep ingest as-is; add metadata flags (e.g., curated, candidate).
|
|
- Motion Store (existing): motions table + embeddings table; add an **events/audit** table for user actions and important state transitions.
|
|
- Summarizer / Embedding Worker (existing): scheduled job that ensures motions have layman_explanation and embeddings; add retry/backoff and logging.
|
|
- Similarity service (new): computes nearest neighbors using stored vectors in-process for MVP and caches results in a small table. Swap to a vector index later if needed.
|
|
- Session & Voting (existing): continue using user_sessions JSON blob for individual sessions; add optional event log entries for each vote.
|
|
- UI (update): add "Explore" landing, motion detail view with layman text, party-match snapshot, related motions, and bookmark/flag actions. Reuse Streamlit components.
|
|
- Admin tooling (new): migration scripts, a CLI to recompute embeddings/similarity, and an audit query helper.
|
|
|
|
## Data Flow
|
|
|
|
1. Ingest job (api_client/scraper) produces motion records and calls db.insert_motion.
|
|
2. Summarizer worker picks up motions without layman_explanation or embeddings, calls ai_provider, and writes layman_explanation + embeddings.
|
|
3. Clusterer/similarity job computes related-motion lists using stored embeddings and writes them to a cache table.
|
|
4. UI "Explore" shows curated motion lists; "Motion detail" reads motion, layman_explanation, party-match snapshot, and cached related motions.
|
|
5. User vote actions update user_sessions and also append an event to the audit table for traceability.
|
|
6. Background analytics (optional) reuses user_events and embeddings for offline insights.
|
|
|
|
## Error Handling Strategy
|
|
|
|
- External calls: add retries with exponential backoff for AI provider and external APIs. Failures set a marker (e.g., summary_missing) and the system continues.
|
|
- Missing embeddings: UI gracefully disables "related motions" and offers "compute on demand".
|
|
- Idempotency: make insert_motion idempotent by URL/external id check at DB layer; use optimistic handling for duplicates.
|
|
- Concurrency: avoid read-modify-write races by writing user events (append-only) and deriving session state from events when race-prone updates are detected.
|
|
- Observability: replace prints with structured logging (module-level logger) and add basic metrics for worker errors, API failures, and queue lags.
|
|
|
|
## Testing Strategy
|
|
|
|
- Unit tests: DB helpers (insert_motion, store_embedding, similarity cache), summarizer functions (mock ai_provider), and session vote logic.
|
|
- Migration tests: follow the existing pattern of applying migration SQL in a temp DB and asserting schema.
|
|
- Integration tests: end-to-end ingest → summarize → embedding → similarity → UI-read path in CI (use monkeypatch for AI calls).
|
|
- Load tests: simulate a few thousand embeddings search calls against the in-process search to validate performance assumptions for MVP.
|
|
- Acceptance: confirm UX flows: Explore session, Motion detail, Vote -> party match, Related motions populated.
|
|
|
|
## High-level Plan & Estimates
|
|
|
|
Assumptions: one full-stack engineer (Python + Streamlit) and one part-time reviewer. All estimates are rough.
|
|
|
|
Milestone 0 — Validate & quick discovery (1 day)
|
|
- Locate user's added markdown plan and extract exact requirements. (I'm assuming the file exists in thoughts/shared; if not, we validated by searching.)
|
|
|
|
Milestone 1 — MVP (8–12 engineer days)
|
|
- Add similarity cache table and migration.
|
|
- Summarizer: make embedding generation robust with retries and store vectors.
|
|
- Clusterer job: compute and cache related motions.
|
|
- UI: Explore landing, Motion detail page, related motion UI, bookmark/flag button.
|
|
- Add event/audit table and write events on user votes and bookmarks.
|
|
|
|
Milestone 2 — Hardening & instrumentation (3–5 engineer days)
|
|
- Replace prints with structured logging across touched modules.
|
|
- Add migration tests and CI integration tests (mock AI).
|
|
- Add health metrics & basic alerting for worker failures.
|
|
|
|
Milestone 3 — Polish & UX feedback (3–5 engineer days)
|
|
- UX tweaks, performance tuning, compute on-demand fallback for embeddings, documentation, admin CLI.
|
|
|
|
Total MVP + polish: ~2–3 weeks of focused work.
|
|
|
|
## Risks & Mitigations
|
|
|
|
- Risk: Naive in-process embedding search will not scale. Mitigation: cache nearest neighbors per motion and plan a migration path to a vector index.
|
|
- Risk: AI provider flakiness. Mitigation: retries, timeouts, and clear UI fallback. Tests must mock provider in CI.
|
|
- Risk: Race conditions on session votes. Mitigation: append-only event log and derive authoritative session view from events when needed.
|
|
- Risk: Schema drift and missing migrations. Mitigation: add migration tests and document required migrations in repo.
|
|
|
|
## Open Questions
|
|
|
|
- Which exact user journeys do we want first (single-session discover vs. persistent account/bookmarking)?
|
|
- Do we want bookmarks persisted globally or per-session only? (Privacy implications.)
|
|
- What's acceptable latency for "related motions" — precomputed nightly vs. near-real-time?
|
|
- Any policy/legal ban on storing full body_text or on long-term retention of user votes?
|
|
|
|
---
|
|
|
|
I'm proceeding to create the design doc file at thoughts/shared/designs/2026-03-21-motions-guided-explorer-design.md and will spawn the implementation planner next. Interrupt if you want changes to the approach or scope now.
|
|
|