motief

Commit Graph

Author	SHA1	Message	Date
Sven Geboers	afdfe298cd	fix: switch to Lasso regression for better axis stability - Replace Ridge with Lasso (L1) regression to concentrate weights on fewer dimensions, improving stability measurement - Default alpha changed to 0.1 (Lasso needs smaller values than Ridge) - Fix dimension alignment issues in semantic drift and centroid computation - Add dimension alignment in compute_semantic_drift and _generate_report Results with Lasso alpha=0.1: - 9/10 axes now stable (>0.7): [1, 2, 3, 4, 5, 7, 8, 9, 10] - Axis 6 reordered (0.25-0.5 range) - Axis 8 shows inflection points in 2016→2017→2018 - Overtone shift detected on all stable axes (1.3-1.9 range)	4 weeks ago
Sven Geboers	9bb7e8efad	feat: add overtone shift analysis and update report - Add compute_overtone_shift(): tracks semantic gravity movement across windows even when party ordering stays the same - Update _generate_report() with overtone shift section including dimension-level analysis and inflection point detection - Update methodology section to reflect new metrics - All 12 tests pass Key finding: no axes exceed 0.7 stability threshold — semantic features defining each SVD axis shift significantly across windows (0.06-0.51 range)	4 weeks ago
Sven Geboers	1c58429ab0	refactor: replace axis stability with Ridge regression weights - Replace Procrustes-based stability with Ridge regression on fused embeddings - For each SVD axis, fit Ridge: SVD_score ~ fused_embedding per window - Compare weight vectors via max(cosine similarity, Jaccard top-100) - Add --regression-alpha CLI argument (default 1.0) - Keep party-based fallback for windows with < 50 motions - Update tests for new regression-based approach Key finding: regression weights show moderate stability (0.06-0.51) but no axes exceed 0.7 threshold — semantic features defining each axis shift significantly across windows	4 weeks ago
Sven Geboers	50fafeecf3	feat: add motion semantic drift analysis script - Add scripts/motion_drift.py: analyzes SVD axis stability, semantic drift, and cross-ideological voting patterns across annual windows - Add analysis/motion_drift.py: core analysis functions with Procrustes alignment fallback using party-based sign consistency - Add matplotlib dependency for static chart generation - Add tests/test_motion_drift.py: 12 tests covering all analysis functions - Report output: markdown with embedded PNG charts Key findings from real data: - No axes are fully stable (>0.7) across 2019-2026 - All axes show moderate consistency (0.40-0.47) — stable within periods but flip between cabinet periods (2019/2022/2026 vs 2023/2024/2025) - Party voting analysis detects cross-ideological voting patterns	4 weeks ago
Sven Geboers	846e9cf67f	fix: import canonical parties from config, simplify theme consistency check	4 weeks ago
Sven Geboers	bad9cd758d	docs: add SVD theme divergence solution doc and validation hook	4 weeks ago
Sven Geboers	bfe37c6806	fix: align report generation with JSON output for positive/negative separation Bug: report_per_component used scored[:args.report_top_n] which took top N by score (all positive for components with only positive scores). JSON correctly separated positive and negative poles. Fix: Use same positive/negative separation logic for report as JSON.	4 weeks ago
Sven Geboers	33edb334c4	feat: implement exclusive SVD motion assignment with label review report - Each motion now assigned to exactly one component (highest absolute score) - Added --exclusive flag (default: True) for backward compatibility - Added markdown report generation with motion details for label review - Added --report-top-n for report size (default: 20 per component) - Updated JSON output with 'exclusive' flag for transparency	4 weeks ago
Sven Geboers	c9c59dd166	feat(diagnostics): enhance trajectory diagnostic script with real data mode	1 month ago
Sven Geboers	9f98dbae60	Add debug st.info before st.plotly_chart to diagnose invisible chart	1 month ago
Sven Geboers	88110b0aaa	Fix update_existing_motions: single write connection and module-level duckdb import Use one DuckDB write connection for the entire update loop instead of opening/closing per row, wrapped in try/finally for proper cleanup. Move 'import duckdb' to module level with other imports.	1 month ago
Sven Geboers	be8887f6f8	Add --skip-details, --update-existing flags to download_past_year.py with tests Enable backfilling body_text for existing motions that lack it (2016-2018 data). New extract_besluit_id() and update_existing_motions() helpers support the --update-existing mode, while --no-skip-details enables detail fetching during normal downloads. Includes 7 tests covering URL parsing, DB update flow, and argparse wiring.	1 month ago
Sven Geboers	9daa899885	fix: remove motion title truncation, add SVD JSON generation script Removes the raw_title[:80] cap on expander labels so full titles show. Adds scripts/generate_svd_json.py to regenerate top_svd_top_motions.json from any SVD window after a recompute.	1 month ago
Sven Geboers	d1faf2b3e4	feat(mindmodel): add CLI wrapper, edge-case tests, and manifest schema tests	1 month ago
Sven Geboers	f77875ed54	feat(mindmodel): add CLI wrapper and tests	1 month ago
Sven Geboers	a74e6006f5	feat(mindmodel): add validator and tests	1 month ago
Sven Geboers	7bd7d0d18c	feat(mindmodel): add checks utilities and tests	1 month ago
Sven Geboers	2efd7ba3a0	feat(mindmodel): add manifest loader and tests	1 month ago
Sven Geboers	eb73275f32	feat(mp-quiz): add MP quiz tab and DB helpers; add design and plan docs	1 month ago
Sven Geboers	b09e580f65	feat: motion content enrichment pipeline hardening - ai_provider_wrapper: retry/fallback with exponential backoff, None sentinel for failed items - text_pipeline: use wrapper, return 5-tuple (stored, skipped_existing, skipped_no_text, errors, failed_ids) - similarity/compute: filter trivial 1.0 matches on identical short titles (<12 chars) - rerun_embeddings: --retry-missing mode, calls ensure_text_embeddings_for_ids on failed ids - sync_motion_content: per-ext_id retries, HTTPAdapter pool, --max-body-workers CLI flag, audit on failure - qa_similarity script: samples motions, writes JSON ledger to thoughts/ledgers/ - All tests green: 61 passed, 2 skipped	1 month ago
Sven Geboers	2891e9ee70	feat: add StemAtlas Streamlit app, explorer, Docker deployment, blog charts	1 month ago
Sven Geboers	daa22c5e2b	feat: complete parliamentary embedding pipeline with full historical coverage - Add fused (SVD + text) embedding pipeline for annual windows 2016-2026 - Fix store_fused_embedding duplicate bug: DELETE before INSERT (idempotent) - Add --text-batch-size CLI flag to run_pipeline.py (default 200) - Add explicit --start-date/--end-date to download_past_year.py - Backfill mp_votes for all motions (party-level votes, 111k new rows) - Add similarity cache recompute: 212k rows across 9 annual windows - Improve ai_provider retry logic, text_pipeline batching - Improve analysis/political_axis PCA handling and visualizations - Add diagnostic/utility scripts: compare_svd, generate_compass, inspect_axis, etc. - Untrack data/motions.db (3.6GB binary), add to .gitignore with outputs/ - Update continuity ledger with full session state	1 month ago
Sven Geboers	847b783877	fix(pipeline): fix API pagination, add skip_details fast path, bulk mp_votes insert - _get_voting_records returns (records, besluit_meta) tuple; paginate via Besluit?expand=Stemming (469/mo vs 8400) - get_motions(skip_details=True) bypasses per-motion detail chain (3 HTTP calls/motion) - extract_mp_votes rewritten: bulk DataFrame insert (80k rows in 1.9s), includes party-level actors - run_pipeline.py fixed: pass db_path not db, handle dict/int return types - download_past_year.py: skip_details=True default, limit-per-chunk default 50000	1 month ago
Sven Geboers	a36e6cba4e	feat(pipeline): implement parliamentary embedding pipeline MVP - Add 4 migration files: mp_votes, mp_metadata, svd_vectors, fused_embeddings - Extend database.py with 5 new helper methods and table init - Add pipeline/ package: extract_mp_votes, fetch_mp_metadata, text_pipeline, svd_pipeline (with Procrustes alignment), fusion - Add full test suite (17 tests) covering all pipeline modules and migrations - Fix Procrustes alignment bug: scipy scale is a norm value, not a multiplier - Fix DuckDB date type handling in test assertions (datetime.date vs string) - Remove duckdb.py shim; tests now run against real duckdb + scipy via uv Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md	1 month ago

24 Commits (afdfe298cd70030dafb7ac8b5b8f9a6c6deaee3c)