Sven Geboers
ebb663aa8f
docs: add test refactor implementation plan
1 month ago
Sven Geboers
07a89a207c
docs: add test refactor design - replace monkeypatching with DI + in-memory DuckDB
1 month ago
Sven Geboers
ce27dc6ac5
chore(ledgers): record fusion+similarity run summary and JSON details
1 month ago
Sven Geboers
2891e9ee70
feat: add StemAtlas Streamlit app, explorer, Docker deployment, blog charts
1 month ago
Sven Geboers
daa22c5e2b
feat: complete parliamentary embedding pipeline with full historical coverage
...
- Add fused (SVD + text) embedding pipeline for annual windows 2016-2026
- Fix store_fused_embedding duplicate bug: DELETE before INSERT (idempotent)
- Add --text-batch-size CLI flag to run_pipeline.py (default 200)
- Add explicit --start-date/--end-date to download_past_year.py
- Backfill mp_votes for all motions (party-level votes, 111k new rows)
- Add similarity cache recompute: 212k rows across 9 annual windows
- Improve ai_provider retry logic, text_pipeline batching
- Improve analysis/political_axis PCA handling and visualizations
- Add diagnostic/utility scripts: compare_svd, generate_compass, inspect_axis, etc.
- Untrack data/motions.db (3.6GB binary), add to .gitignore with outputs/
- Update continuity ledger with full session state
1 month ago
Sven Geboers
a248807e03
Add design: embedding-based motion similarity cache
...
Precomputed top-K similarity cache replacing the naive Python-scan
search_similar(). Also covers fixes for: embeddings table missing from
_init_database, fusion N+1 query, and ai_provider 429 retry.
1 month ago
Sven Geboers
a36e6cba4e
feat(pipeline): implement parliamentary embedding pipeline MVP
...
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md
1 month ago
Sven Geboers
c498c3467e
update plan: replace spike with confirmed FractieZetelPersoon fetch task
1 month ago
Sven Geboers
0bbda408fb
plan: parliamentary embedding pipeline MVP implementation plan
1 month ago
Sven Geboers
fd73da3752
design: parliamentary embedding pipeline (late fusion SVD + text)
1 month ago