motief

Commit Graph

Author	SHA1	Message	Date
Sven Geboers	7bd7d0d18c	feat(mindmodel): add checks utilities and tests	3 months ago
Sven Geboers	2efd7ba3a0	feat(mindmodel): add manifest loader and tests	3 months ago
Sven Geboers	238d9e9ec2	test: add full quiz tab test suite and fix Geen stem normalization - Fix match_mps_for_votes: 'Geen stem'/'no vote' now normalize to None (skipped), not 'afwezig' — so unanswered questions don't inflate overlap count - Add 5 additional tests to test_match_mps.py: zero-overlap exclusion, empty input validation, Geen stem overlap skip, excluded motions respected - Add tests/test_explorer_quiz.py: builder import smoke test plus 3 real-DB end-to-end scenarios (unique match, indistinguishable MPs, discriminating question reduces candidate set) - Full suite: 73 passed, 2 skipped	3 months ago
Sven Geboers	eb73275f32	feat(mp-quiz): add MP quiz tab and DB helpers; add design and plan docs	3 months ago
Sven Geboers	b09e580f65	feat: motion content enrichment pipeline hardening - ai_provider_wrapper: retry/fallback with exponential backoff, None sentinel for failed items - text_pipeline: use wrapper, return 5-tuple (stored, skipped_existing, skipped_no_text, errors, failed_ids) - similarity/compute: filter trivial 1.0 matches on identical short titles (<12 chars) - rerun_embeddings: --retry-missing mode, calls ensure_text_embeddings_for_ids on failed ids - sync_motion_content: per-ext_id retries, HTTPAdapter pool, --max-body-workers CLI flag, audit on failure - qa_similarity script: samples motions, writes JSON ledger to thoughts/ledgers/ - All tests green: 61 passed, 2 skipped	3 months ago
Sven Geboers	aef7c45074	Refactor tests: replace sys.modules hacks with real DI + in-memory DB - Add db=None, embedder=None params to ai_provider_wrapper, text_pipeline, compute_similarities - New conftest.py: FakeEmbedder, mem_db (in-memory DuckDB), fake_embedder fixtures - Rewrite test_ai_provider_wrapper (4 tests), test_rerun_embeddings_retry (2 tests), test_similarity_compute_filter (1 test) with real implementations - Fix rerun_embeddings tests hanging on _get_all_windows by patching it alongside _clear_embeddings - All 53 tests pass (2 skipped), 0 sys.modules hacks in refactored files	3 months ago
Sven Geboers	b7350d8f87	test: rewrite test_database_audit using mem_db fixture, no disk writes required	3 months ago
Sven Geboers	e4f2c7ff59	fix: update integration test to unpack 5-tuple from ensure_text_embeddings	3 months ago
Sven Geboers	2891e9ee70	feat: add StemAtlas Streamlit app, explorer, Docker deployment, blog charts	3 months ago
Sven Geboers	daa22c5e2b	feat: complete parliamentary embedding pipeline with full historical coverage - Add fused (SVD + text) embedding pipeline for annual windows 2016-2026 - Fix store_fused_embedding duplicate bug: DELETE before INSERT (idempotent) - Add --text-batch-size CLI flag to run_pipeline.py (default 200) - Add explicit --start-date/--end-date to download_past_year.py - Backfill mp_votes for all motions (party-level votes, 111k new rows) - Add similarity cache recompute: 212k rows across 9 annual windows - Improve ai_provider retry logic, text_pipeline batching - Improve analysis/political_axis PCA handling and visualizations - Add diagnostic/utility scripts: compare_svd, generate_compass, inspect_axis, etc. - Untrack data/motions.db (3.6GB binary), add to .gitignore with outputs/ - Update continuity ledger with full session state	3 months ago
Sven Geboers	a78bee9b0a	feat(similarity): add precomputed similarity cache, fix fusion N+1, add 429 retry - Add similarity/ package (compute.py, lookup.py) with numpy-based pairwise cosine similarity and cached lookup - database.py: create embeddings + similarity_cache tables in _init_database(), add store_similarity_batch/get_cached_similarities/clear_similarity_cache helpers - pipeline/fusion.py: replace N+1 per-motion embedding SELECT with single bulk JOIN using DuckDB QUALIFY window function - ai_provider.py: retry HTTP 429 with Retry-After header support - migrations/2026-03-22-add-similarity-cache.sql: make executable - Add tests for similarity compute, db helpers, and 429 retry (34 pass, 2 skip)	3 months ago
Sven Geboers	3551a82f83	feat(analysis): add 2D political compass and 2D trajectories - compute_2d_axes (PCA + anchor) - compute_2d_trajectories - plot_political_compass, plot_2d_trajectories - unit test: tests/test_political_compass.py	3 months ago
Sven Geboers	aa2f66ac9f	feat(analysis): fetch real MP metadata, fix anchor axis for party-level actors - fetch_mp_metadata: use real OData URL with pagination (1200 records, 5 pages) uses Fractie.Afkorting not NaamNL for abbreviation matching skips Verwijderd=true records - upsert_mp_metadata: keep most recent membership (prefer active over ended, then higher Van date) so current party affiliations are not overwritten by historical - compute_anchor_axis: anchor directly on party-level SVD entities (GroenLinks-PvdA etc) before falling back to mp_metadata individual MP lookup - test_fetch_mp_metadata: fix mock for timeout kwarg + pagination + Afkorting field - Generated anchor axis HTML for 2025-Q2 through 2026-Q1 in outputs/	3 months ago
Sven Geboers	5ad83ef1be	fix(tests): update test_extract_mp_votes for party-level actor inclusion New extract_mp_votes behavior inserts all actors (party + individual MPs), not only comma-name MPs. Test now validates both types and their party column. Also adds generated HTML visualizations (political axis x5 windows + trajectories).	3 months ago
Sven Geboers	f2a831dfcf	feat(pipeline): add orchestrator CLI, analysis modules, and ActorFractie ingestion - pipeline/run_pipeline.py: CLI orchestrator for all 5 pipeline phases with --dry-run, --skip-*, --window-size, --svd-k, --start/end-date flags - analysis/{political_axis,trajectory,clustering,visualize}.py: PCA/anchor ideological axis, MP drift trajectories, UMAP + KMeans clustering, Plotly HTML output - api_client.py: capture ActorFractie per individual MP vote (comma in ActorNaam) into mp_vote_parties dict on each motion - database.insert_motion: auto-insert mp_votes rows with party affiliation for newly ingested motions when mp_vote_parties is present - Add scikit-learn to pyproject.toml for KMeans clustering - tests/test_run_pipeline.py: window generation, dry-run, skip-all paths - tests/test_analysis.py: PCA axis, anchor axis, trajectory drift, KMeans Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md	3 months ago
Sven Geboers	a36e6cba4e	feat(pipeline): implement parliamentary embedding pipeline MVP - Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings - Extend database.py with 5 new helper methods and table init - Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline, svd_pipeline (with Procrustes alignment), fusion - Add full test suite (17 tests) covering all pipeline modules and migrations - Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier - Fix DuckDB date type handling in test assertions (datetime.date vs string) - Remove duckdb.py shim; tests now run against real duckdb + scipy via uv Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md	3 months ago

16 Commits (7bd7d0d18c1c0567abb7c0c6046724865005cb42)