Sven Geboers
9daa899885
fix: remove motion title truncation, add SVD JSON generation script
...
Removes the raw_title[:80] cap on expander labels so full titles show.
Adds scripts/generate_svd_json.py to regenerate top_svd_top_motions.json
from any SVD window after a recompute.
1 month ago
Sven Geboers
d1faf2b3e4
feat(mindmodel): add CLI wrapper, edge-case tests, and manifest schema tests
1 month ago
Sven Geboers
f77875ed54
feat(mindmodel): add CLI wrapper and tests
1 month ago
Sven Geboers
a74e6006f5
feat(mindmodel): add validator and tests
1 month ago
Sven Geboers
7bd7d0d18c
feat(mindmodel): add checks utilities and tests
1 month ago
Sven Geboers
2efd7ba3a0
feat(mindmodel): add manifest loader and tests
1 month ago
Sven Geboers
eb73275f32
feat(mp-quiz): add MP quiz tab and DB helpers; add design and plan docs
1 month ago
Sven Geboers
b09e580f65
feat: motion content enrichment pipeline hardening
...
- ai_provider_wrapper: retry/fallback with exponential backoff, None sentinel for failed items
- text_pipeline: use wrapper, return 5-tuple (stored, skipped_existing, skipped_no_text, errors, failed_ids)
- similarity/compute: filter trivial 1.0 matches on identical short titles (<12 chars)
- rerun_embeddings: --retry-missing mode, calls ensure_text_embeddings_for_ids on failed ids
- sync_motion_content: per-ext_id retries, HTTPAdapter pool, --max-body-workers CLI flag, audit on failure
- qa_similarity script: samples motions, writes JSON ledger to thoughts/ledgers/
- All tests green: 61 passed, 2 skipped
1 month ago
Sven Geboers
2891e9ee70
feat: add StemAtlas Streamlit app, explorer, Docker deployment, blog charts
1 month ago
Sven Geboers
daa22c5e2b
feat: complete parliamentary embedding pipeline with full historical coverage
...
- Add fused (SVD + text) embedding pipeline for annual windows 2016-2026
- Fix store_fused_embedding duplicate bug: DELETE before INSERT (idempotent)
- Add --text-batch-size CLI flag to run_pipeline.py (default 200)
- Add explicit --start-date/--end-date to download_past_year.py
- Backfill mp_votes for all motions (party-level votes, 111k new rows)
- Add similarity cache recompute: 212k rows across 9 annual windows
- Improve ai_provider retry logic, text_pipeline batching
- Improve analysis/political_axis PCA handling and visualizations
- Add diagnostic/utility scripts: compare_svd, generate_compass, inspect_axis, etc.
- Untrack data/motions.db (3.6GB binary), add to .gitignore with outputs/
- Update continuity ledger with full session state
1 month ago
Sven Geboers
847b783877
fix(pipeline): fix API pagination, add skip_details fast path, bulk mp_votes insert
...
- _get_voting_records returns (records, besluit_meta) tuple; paginate via Besluit?expand=Stemming (469/mo vs 8400)
- get_motions(skip_details=True) bypasses per-motion detail chain (3 HTTP calls/motion)
- extract_mp_votes rewritten: bulk DataFrame insert (80k rows in 1.9s), includes party-level actors
- run_pipeline.py fixed: pass db_path not db, handle dict/int return types
- download_past_year.py: skip_details=True default, limit-per-chunk default 50000
1 month ago
Sven Geboers
a36e6cba4e
feat(pipeline): implement parliamentary embedding pipeline MVP
...
- Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings
- Extend database.py with 5 new helper methods and table init
- Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline,
svd_pipeline (with Procrustes alignment), fusion
- Add full test suite (17 tests) covering all pipeline modules and migrations
- Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier
- Fix DuckDB date type handling in test assertions (datetime.date vs string)
- Remove duckdb.py shim; tests now run against real duckdb + scipy via uv
Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md
1 month ago