Sven Geboers
eb73275f32
feat(mp-quiz): add MP quiz tab and DB helpers; add design and plan docs
1 month ago
Sven Geboers
b09e580f65
feat: motion content enrichment pipeline hardening
...
- ai_provider_wrapper: retry/fallback with exponential backoff, None sentinel for failed items
- text_pipeline: use wrapper, return 5-tuple (stored, skipped_existing, skipped_no_text, errors, failed_ids)
- similarity/compute: filter trivial 1.0 matches on identical short titles (<12 chars)
- rerun_embeddings: --retry-missing mode, calls ensure_text_embeddings_for_ids on failed ids
- sync_motion_content: per-ext_id retries, HTTPAdapter pool, --max-body-workers CLI flag, audit on failure
- qa_similarity script: samples motions, writes JSON ledger to thoughts/ledgers/
- All tests green: 61 passed, 2 skipped
1 month ago
Sven Geboers
aef7c45074
Refactor tests: replace sys.modules hacks with real DI + in-memory DB
...
- Add db=None, embedder=None params to ai_provider_wrapper, text_pipeline, compute_similarities
- New conftest.py: FakeEmbedder, mem_db (in-memory DuckDB), fake_embedder fixtures
- Rewrite test_ai_provider_wrapper (4 tests), test_rerun_embeddings_retry (2 tests), test_similarity_compute_filter (1 test) with real implementations
- Fix rerun_embeddings tests hanging on _get_all_windows by patching it alongside _clear_embeddings
- All 53 tests pass (2 skipped), 0 sys.modules hacks in refactored files
1 month ago
Sven Geboers
daa22c5e2b
feat: complete parliamentary embedding pipeline with full historical coverage
...
- Add fused (SVD + text) embedding pipeline for annual windows 2016-2026
- Fix store_fused_embedding duplicate bug: DELETE before INSERT (idempotent)
- Add --text-batch-size CLI flag to run_pipeline.py (default 200)
- Add explicit --start-date/--end-date to download_past_year.py
- Backfill mp_votes for all motions (party-level votes, 111k new rows)
- Add similarity cache recompute: 212k rows across 9 annual windows
- Improve ai_provider retry logic, text_pipeline batching
- Improve analysis/political_axis PCA handling and visualizations
- Add diagnostic/utility scripts: compare_svd, generate_compass, inspect_axis, etc.
- Untrack data/motions.db (3.6GB binary), add to .gitignore with outputs/
- Update continuity ledger with full session state
1 month ago
Sven Geboers
a36e6cba4e
feat(pipeline): implement parliamentary embedding pipeline MVP
...
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md
1 month ago