motief

Commit Graph

Author	SHA1	Message	Date
Sven Geboers	847b783877	fix(pipeline): fix API pagination, add skip_details fast path, bulk mp_votes insert - _get_voting_records returns (records, besluit_meta) tuple; paginate via Besluit?expand=Stemming (469/mo vs 8400) - get_motions(skip_details=True) bypasses per-motion detail chain (3 HTTP calls/motion) - extract_mp_votes rewritten: bulk DataFrame insert (80k rows in 1.9s), includes party-level actors - run_pipeline.py fixed: pass db_path not db, handle dict/int return types - download_past_year.py: skip_details=True default, limit-per-chunk default 50000	3 months ago
Sven Geboers	f2a831dfcf	feat(pipeline): add orchestrator CLI, analysis modules, and ActorFractie ingestion - pipeline/run_pipeline.py: CLI orchestrator for all 5 pipeline phases with --dry-run, --skip-*, --window-size, --svd-k, --start/end-date flags - analysis/{political_axis,trajectory,clustering,visualize}.py: PCA/anchor ideological axis, MP drift trajectories, UMAP + KMeans clustering, Plotly HTML output - api_client.py: capture ActorFractie per individual MP vote (comma in ActorNaam) into mp_vote_parties dict on each motion - database.insert_motion: auto-insert mp_votes rows with party affiliation for newly ingested motions when mp_vote_parties is present - Add scikit-learn to pyproject.toml for KMeans clustering - tests/test_run_pipeline.py: window generation, dry-run, skip-all paths - tests/test_analysis.py: PCA axis, anchor axis, trajectory drift, KMeans Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md	3 months ago

Author

SHA1

Message

Date

Sven Geboers

847b783877

fix(pipeline): fix API pagination, add skip_details fast path, bulk mp_votes insert

- _get_voting_records returns (records, besluit_meta) tuple; paginate via Besluit?expand=Stemming (469/mo vs 8400)
- get_motions(skip_details=True) bypasses per-motion detail chain (3 HTTP calls/motion)
- extract_mp_votes rewritten: bulk DataFrame insert (80k rows in 1.9s), includes party-level actors
- run_pipeline.py fixed: pass db_path not db, handle dict/int return types
- download_past_year.py: skip_details=True default, limit-per-chunk default 50000

3 months ago

Sven Geboers

f2a831dfcf

feat(pipeline): add orchestrator CLI, analysis modules, and ActorFractie ingestion

- pipeline/run_pipeline.py: CLI orchestrator for all 5 pipeline phases with
  --dry-run, --skip-*, --window-size, --svd-k, --start/end-date flags
- analysis/{political_axis,trajectory,clustering,visualize}.py: PCA/anchor
  ideological axis, MP drift trajectories, UMAP + KMeans clustering, Plotly HTML output
- api_client.py: capture ActorFractie per individual MP vote (comma in ActorNaam)
  into mp_vote_parties dict on each motion
- database.insert_motion: auto-insert mp_votes rows with party affiliation for
  newly ingested motions when mp_vote_parties is present
- Add scikit-learn to pyproject.toml for KMeans clustering
- tests/test_run_pipeline.py: window generation, dry-run, skip-all paths
- tests/test_analysis.py: PCA axis, anchor axis, trajectory drift, KMeans

Ref: thoughts/shared/plans/2026-03-21-parliamentary-embedding-pipeline-plan.md

3 months ago

2 Commits (847b783877b02234829741a085efd858b1e517d8)