- Acceptance: Spike result documented; test asserts field extraction from mocked response; full fetch (2.2b) scheduled or fallback heuristic decided
- Name normalization: reconstruct ActorNaam format from Persoon fields: `"{Tussenvoegsel} {Achternaam}, {Initialen}".strip()` (must match keys in voting_results JSON, e.g. "Yesilgöz-Zegerius, D.")
- Persoon.Id stored as source_id (GUID) for deduplication
- Stores via MotionDatabase.upsert_mp_metadata; idempotent on re-run
- Test: tests/test_fetch_mp_metadata.py — monkeypatch requests.get with canned FractieZetelPersoon+Persoon response; assert name normalization and DB rows
- Hours: 3.5 | Priority: highest | Depends: 1.3
- Acceptance: mp_metadata rows correct; name normalization tested for tussenvoegsel variants; TotEnMet=null handled correctly; re-run idempotent
### Task 2.3: pipeline/text_pipeline.py
### Task 2.3: pipeline/text_pipeline.py
- Ensure every motion has a text embedding; delegates to existing ai_provider.get_embedding
- Ensure every motion has a text embedding; delegates to existing ai_provider.get_embedding
@ -95,7 +100,7 @@ Batch 3 (parallel): 3.1, 3.2 [integration & CI - depends on batch 2]
2. Use existing ai_provider.get_embedding for text embeddings — no new model calls
2. Use existing ai_provider.get_embedding for text embeddings — no new model calls
3. SVD k enforced dynamically (k <min(n_mps,n_motions));testscoverthispath
3. SVD k enforced dynamically (k <min(n_mps,n_motions));testscoverthispath
4. Procrustes rotation matrices NOT persisted in MVP (aligned vectors stored directly)
4. Procrustes rotation matrices NOT persisted in MVP (aligned vectors stored directly)
5. mp_metadata: try OData spike first; fallback to majority-party heuristic if unavailable
5. mp_metadata: fetch from OData FractieZetelPersoon endpoint (confirmed available); Van/TotEnMet give tenure windows
6. Default quarterly time windows, but parameterized for Annual validation in Sprint 2
6. Default quarterly time windows, but parameterized for Annual validation in Sprint 2
7. All new helpers go into existing database.py MotionDatabase class (not a new module)
7. All new helpers go into existing database.py MotionDatabase class (not a new module)
8. Analysis/visualization (UMAP, Plotly plots) is a follow-up sprint, NOT included here
8. Analysis/visualization (UMAP, Plotly plots) is a follow-up sprint, NOT included here
## Open questions
## Open questions
1. Does OData /Kamerlid expose party affiliation + tenure dates? (Sprint 1 spike answers this)
1. [RESOLVED] OData FractieZetelPersoon confirmed available with Van/TotEnMet tenure dates; Stemming.ActorFractie gives party for each individual vote; name normalization from Persoon.Achternaam+Initialen+Tussenvoegsel confirmed feasible
2. Should Procrustes rotation matrices be persisted? (MVP: no; revisit after)
2. Should Procrustes rotation matrices be persisted? (MVP: no; revisit after)
3. Time-window granularity: annual first for stability validation, then quarterly?
3. Time-window granularity: annual first for stability validation, then quarterly?
4. Production k value for SVD: default 50 but must be validated against real data sizes
4. Production k value for SVD: default 50 but must be validated against real data sizes