21 KiB
| title | type | status | date | origin |
|---|---|---|---|---|
| Motion semantic drift analysis over time | feat | active | 2026-04-05 | docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md |
Motion Semantic Drift Analysis Over Time
Overview
Add a new analysis script that tracks how the semantic content of motions on each SVD axis evolves across annual windows (2016-2024). The script produces a markdown report with charts showing axis stability, semantic drift timelines, party voting trajectories, and cross-ideological voting patterns. This is Phase 1 (script + report); a future phase will integrate this into the Streamlit explorer.
Problem Frame
The SVD explorer shows where parties and motions sit on axes at a point in time, but doesn't reveal how the semantic content evolves. Users can't answer: did "right-wing" motions become more extreme over time? Are the SVD axes themselves stable across windows? Do left-wing parties increasingly vote for right-wing motions? (see origin: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md)
Requirements Trace
- R1. Compute cosine similarity between SVD component vectors (or motion projection patterns) across all annual windows
- R2. Generate a stability heatmap showing which axes are comparable across time
- R3. Detect axis reordering across windows
- R4. Flag unstable axes
- R5. For each stable axis, compute average fused embedding centroid of top N motions per window
- R6. Track semantic drift using cosine distance between consecutive window centroids
- R7. Identify inflection points where drift accelerated (threshold-based)
- R8. Show example motions before/after inflection points
- R9. For each party, compute voting centroid per window along each stable axis
- R10. Track party trajectories over time
- R11. Detect cross-ideological voting patterns
- R12. Show concrete examples of parties voting against ideological alignment
- R13. Script produces markdown report with embedded charts
- R14. Report includes: stability heatmap, drift timelines, party trajectories, inflection analysis
- R15. Script is parameterized:
--db,--windows,--top-n,--output
Scope Boundaries
- Annual windows only (2016-2024); quarterly windows too sparse
- Script + report only — no UI/explorer integration in this phase
- No statistical significance testing beyond basic change-point detection
- SVD component vectors (V^T matrix) not currently stored — must be added to pipeline or computed indirectly
Context & Research
Relevant Code and Patterns
scripts/generate_svd_json.py— script structure pattern:main(argv) -> int, argparse, ROOT path setup, loggerscripts/svd_diagnostics.py— generates markdown + JSON report from SVD analysisanalysis/explorer_data.py— DuckDB data loading patterns (read_only, try/finally, vector parsing),load_mp_vectors_by_party_for_window()for date-aware party normalizationanalysis/trajectory.py— existing cross-window drift computation using_procrustes_align_windows()pipeline/svd_pipeline.py— SVD computation; V^T available asVtvariable before scalingtests/test_analysis.py— test patterns:tmp_pathfixture,_setup_svd_vectors()helper, class-based testsanalysis/config.py—CANONICAL_RIGHT/CANONICAL_LEFTfor cross-ideological voting detection
Key Technical Decisions
- matplotlib for static charts — no matplotlib usage exists in codebase; this introduces a new dependency. Alternative: Plotly static image export (already in stack). Decision: use matplotlib for markdown-embedded PNGs; simpler for static reports.
- V^T storage via dedicated entity_type — store raw V^T matrix as
entity_type='vt_matrix'row insvd_vectors. Historical windows won't have V^T; motion-ranking correlation fallback is the primary approach for this phase. - Axis stability via motion projection patterns with Procrustes alignment — since V^T may not be available for historical windows, compute axis stability indirectly. First apply Procrustes alignment (reuse
_procrustes_align_windows()fromanalysis/trajectory.py) to motion vectors across windows, then correlate top-N motion rankings per component. This handles SVD sign ambiguity and rotation. - Threshold-based change-point detection — simple drift rate threshold (no new dependencies). Detect when consecutive drift exceeds 2× median drift rate.
- Stability threshold — cosine similarity > 0.7 classifies axes as stable. Default parameterized via
--stability-thresholdwith 0.7 as default. Distribution of similarity values reported in output for sensitivity assessment. - Cross-ideological voting — use
CANONICAL_RIGHTfromanalysis.configto identify right-wing motions (high positive loading on axis 1), then detect left-wing parties voting "voor" on those motions. Axis polarity determined per-window using canonical party scores, not global constants.
Open Questions
Resolved During Planning
- Charting library: matplotlib for static PNG embedding in markdown. Add to
pyproject.toml. - Change-point detection: Simple threshold on drift rate (2× median). No new dependencies.
- Party-motion linkage: Use
mp_votestable — party voted "voor" on motion. This measures voting alignment, not sponsorship. - Axis stability approach: Two-tier — (a) if V^T available, use cosine similarity; (b) fallback: Procrustes-align motion vectors, then correlate top-N motion rankings per component across windows.
- Top N for centroids: Default N=20, parameterized via
--top-n. Test during execution.
Deferred to Implementation
- Exact optimal N for top motions per axis — will test N=10, 20, 50 during execution and pick the one with clearest signal
- Cross-ideological voting threshold — provisional: party voting "voor" on motions where canonical opposite-wing parties have high absolute loadings; will calibrate against baseline
High-Level Technical Design
This illustrates the intended approach and is directional guidance for review, not implementation specification.
┌─────────────────────────────────────────────────────────────────┐
│ scripts/motion_drift.py │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Load Data │
│ ├── fused_embeddings (per window, per motion) │
│ ├── svd_vectors (motion projections per window) │
│ ├── mp_votes (party voting records) │
│ └── motions (text for examples) │
│ │
│ 2. Axis Stability │
│ ├── Procrustes-align motion vectors across windows │
│ ├── Option A: cosine similarity of V^T vectors (if stored) │
│ └── Option B: correlate top-N motion rankings per component │
│ └── Output: stability heatmap (window × component matrix) │
│ │
│ 3. Semantic Drift │
│ ├── For each stable axis: │
│ │ ├── Get top N motions by |loading| per window │
│ │ ├── Compute fused embedding centroid per window │
│ │ └── Cosine distance between consecutive windows │
│ └── Output: drift timeline per axis + inflection points │
│ │
│ 4. Party Voting Analysis │
│ ├── For each party (with date-aware name normalization): │
│ │ ├── Get motions party voted "voor" on per window │
│ │ └── Compute voting centroid along each stable axis │
│ ├── Cross-ideological detection (per-window axis polarity): │
│ │ ├── Left parties voting "voor" on right-wing motions │
│ │ └── Right parties voting "voor" on left-wing motions │
│ └── Output: party trajectory plots + cross-voting examples │
│ │
│ 5. Report Generation │
│ ├── Markdown with embedded matplotlib PNGs │
│ ├── Axis stability heatmap │
│ ├── Semantic drift timelines │
│ ├── Party trajectory plots │
│ └── Inflection point analysis with motion examples │
└─────────────────────────────────────────────────────────────────┘
Implementation Units
- Unit 1: Add matplotlib dependency and script scaffolding
Goal: Set up the new script with proper structure and dependencies.
Requirements: R15
Dependencies: None
Files:
- Modify:
pyproject.toml(add matplotlib) - Create:
scripts/motion_drift.py - Test:
tests/test_motion_drift.py
Approach:
- Add
matplotlib>=3.8topyproject.tomldependencies - Create
scripts/motion_drift.pyfollowing established script pattern:main(argv) -> int, argparse with--db,--windows,--top-n,--output, ROOT path setup, module logger - Add schema validation at startup: check for required tables (
svd_vectors,fused_embeddings,mp_votes,motions) - Create minimal
tests/test_motion_drift.pywith import test, argument parsing test, and schema validation test using in-memory DuckDB fixture
Patterns to follow:
scripts/generate_svd_json.py— script structure, argparse, entry pointscripts/svd_diagnostics.py— report generation patterntests/test_analysis.py—tmp_pathfixture,_setup_svd_vectors()helper
Test scenarios:
- Happy path:
main(["--help"])exits with code 0 and prints usage - Happy path:
main(["--db", "data/motions.db", "--output", "/tmp/test"])runs without error - Edge case:
main(["--db", "nonexistent.db"])handles missing database gracefully (exit code 1) - Edge case: database with missing tables produces clear error message
Verification:
-
uv run python scripts/motion_drift.py --helpshows all arguments -
uv run python -m pytest tests/test_motion_drift.py -qpasses -
Unit 2: Axis stability analysis
Goal: Compute axis stability across annual windows and generate stability heatmap.
Requirements: R1, R2, R3, R4
Dependencies: Unit 1
Files:
- Create:
analysis/motion_drift.py(core analysis module) - Modify:
scripts/motion_drift.py(call axis stability) - Test:
tests/test_motion_drift.py
Approach:
- Create
analysis/motion_drift.pywithcompute_axis_stability(db_path, windows)function - Two-tier approach:
- Try loading V^T from
svd_vectorswhereentity_type='vt_matrix'(if stored by pipeline) - Fallback: apply Procrustes alignment to motion vectors across windows (reuse
_procrustes_align_windows()fromanalysis/trajectory.py), then for each window get top N motions per component by absolute score and compute pairwise cosine similarity of motion ranking vectors
- Try loading V^T from
- Generate stability heatmap as matplotlib figure (window × component matrix, color-coded by similarity)
- Return stability report: which axes are stable (similarity > 0.7), which are reordered (high similarity to different component index), which are unstable (low similarity to any component)
Patterns to follow:
analysis/explorer_data.py— DuckDB loading patterns, vector parsinganalysis/trajectory.py—_procrustes_align_windows()for cross-window comparison
Test scenarios:
- Happy path:
compute_axis_stabilityreturns stability matrix for 3+ windows with synthetic data - Happy path: stability matrix is symmetric and values are in [-1, 1]
- Happy path: Procrustes alignment corrects sign flips between windows
- Edge case: single window returns empty stability report (no comparison possible)
- Edge case: windows with no motion vectors handled gracefully (warning logged, skipped)
- Integration: run against real
data/motions.dbannual windows, verify heatmap is generated
Verification:
-
Stability heatmap PNG generated with correct dimensions (windows × components)
-
Stability report identifies at least some axes as stable (similarity > 0.7)
-
Unit 3: Semantic drift analysis
Goal: Compute semantic drift timelines for stable axes and detect inflection points.
Requirements: R5, R6, R7, R8
Dependencies: Unit 2 (needs stable axis list)
Files:
- Modify:
analysis/motion_drift.py(add drift functions) - Modify:
scripts/motion_drift.py(call drift analysis) - Test:
tests/test_motion_drift.py
Approach:
- Add
compute_semantic_drift(db_path, stable_axes, windows, top_n)function - For each stable axis:
- Get top N motions per window by absolute SVD loading
- Compute average fused embedding centroid per window
- Compute cosine distance between consecutive window centroids
- Detect inflection points: where drift rate exceeds 2× median drift rate
- For each inflection point, extract example motions (top 3 before/after by loading)
- Generate drift timeline plot per axis (line chart with inflection point markers)
Patterns to follow:
analysis/trajectory.py—compute_trajectories()for cross-window drift computationscripts/svd_diagnostics.py— markdown report generation
Test scenarios:
- Happy path:
compute_semantic_driftreturns drift series for each stable axis - Happy path: drift values are in [0, 2] (cosine distance range)
- Happy path: inflection points detected when synthetic data has abrupt change
- Edge case: axis with only 2 windows returns drift but no inflection points
- Edge case: axis with monotonic drift returns no inflection points
- Integration: run against real data, verify drift timelines are plausible
Verification:
-
Drift timeline PNG generated per stable axis
-
Inflection points (if any) are marked on timeline with motion examples in report
-
Unit 4: Party voting analysis
Goal: Compute party voting centroids and detect cross-ideological voting patterns.
Requirements: R9, R10, R11, R12
Dependencies: Unit 2 (needs stable axis list)
Files:
- Modify:
analysis/motion_drift.py(add party analysis functions) - Modify:
scripts/motion_drift.py(call party analysis) - Test:
tests/test_motion_drift.py
Approach:
- Add
compute_party_voting(db_path, stable_axes, windows)function - For each party:
- Query
mp_votesfor motions party voted "voor" on per window, using date-aware party name normalization (reuseload_mp_vectors_by_party_for_window()pattern fromanalysis/explorer_data.py) - For each motion, get its SVD scores from
svd_vectors - Compute unweighted mean score along each stable axis (voting centroid)
- Query
- Track party trajectories: plot party centroid position per window along each axis
- Detect cross-ideological voting:
- For each window, independently determine axis polarity by checking where canonical right-wing parties (CANONICAL_RIGHT) score on each axis
- Identify "right-wing" motions (high positive loading on axis where PVV/FVD/JA21/SGP score high after polarity check)
- Find left-wing parties (SP, PvdA, GL, etc.) voting "voor" on right-wing motions
- Compute cross-voting rate per party per window
- Detect trends: is cross-voting increasing or decreasing over time?
- Generate party trajectory plots and cross-voting summary table
Patterns to follow:
analysis/config.py—CANONICAL_RIGHT/CANONICAL_LEFTfor party classificationanalysis/explorer_data.py—mp_votesquery patterns,load_mp_vectors_by_party_for_window()for party normalization
Test scenarios:
- Happy path:
compute_party_votingreturns voting centroids for parties with sufficient data - Happy path: cross-ideological voting detected when synthetic data has left party voting on right motions
- Happy path: party name normalization maps historical names (GL, PvdA → GroenLinks-PvdA) correctly
- Edge case: party with no "voor" votes in a window handled gracefully (centroid = NaN, skipped)
- Edge case: window with no voting data handled gracefully
- Integration: run against real data, verify party trajectories are plausible
Verification:
-
Party trajectory PNG generated showing party movement across windows
-
Cross-voting summary table in report with at least one example
-
Unit 5: Report generation
Goal: Assemble all analysis outputs into a markdown report with embedded charts.
Requirements: R13, R14, R15
Dependencies: Units 2, 3, 4
Files:
- Modify:
scripts/motion_drift.py(orchestrate report generation) - Test:
tests/test_motion_drift.py
Approach:
- Add
_generate_report(output_dir, stability_result, drift_result, party_result)function - Generate markdown with sections:
- Summary (key findings, number of stable axes, inflection points, cross-voting trends)
- Axis Stability (heatmap + interpretation)
- Semantic Drift (timeline per axis + inflection point analysis with motion examples)
- Party Voting Analysis (trajectory plots + cross-voting summary + examples)
- Methodology (brief description of approach, parameters used)
- Save all matplotlib figures as PNGs in output directory
- Embed PNGs in markdown using relative paths
Patterns to follow:
scripts/svd_diagnostics.py— markdown report structurescripts/generate_svd_json.py—_generate_markdown_report()function
Test scenarios:
- Happy path: report generated with all sections and embedded images
- Happy path: all PNG files exist in output directory
- Edge case: no stable axes → report notes this and skips drift/party sections
- Edge case: output directory creation when it doesn't exist
Verification:
output/report.mdexists and contains all expected sections- All referenced PNG files exist in output directory
- Report is readable in a markdown viewer
System-Wide Impact
- Interaction graph: New script reads from existing DuckDB tables; no writes to production data. Pipeline change needed to store V^T matrix (optional, for future windows).
- Unchanged invariants: SVD computation unchanged. Explorer unchanged. Existing analysis modules unchanged.
- New dependency:
matplotlibadded topyproject.toml. First use of matplotlib in codebase.
Risks & Dependencies
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| matplotlib introduces new dependency burden | Low | Low | Already common library; well-maintained. Alternative: use Plotly static export if team prefers single viz stack. |
| V^T matrix not available for historical windows | High | Medium | Fallback to Procrustes-aligned motion ranking correlation (works with existing data). Store V^T going forward. |
| Sparse data in early windows (2016-2018: 124-162 motions) | Medium | Medium | Script warns about low-coverage windows; analysis focuses on 2019+ where data is richer. |
| Cross-ideological voting detection threshold too sensitive/insensitive | Medium | Low | Threshold is parameterized; will calibrate during execution against baseline drift rates. |
| Script exceeds 2-minute runtime on full dataset | Low | Low | JSON parsing of fused embeddings is the bottleneck. Will batch-load and cache if needed. |
Documentation / Operational Notes
- New script:
scripts/motion_drift.py— usage documented in module docstring - New analysis module:
analysis/motion_drift.py— functions documented with docstrings - Report output: markdown with embedded PNGs, shareable without running the script
- Future: integrate analysis into Streamlit explorer tab (separate plan)
Sources & References
- Origin document: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md
- Related code:
scripts/generate_svd_json.py,scripts/svd_diagnostics.py,analysis/trajectory.py,analysis/explorer_data.py - Party sets:
analysis/config.py(CANONICAL_RIGHT, CANONICAL_LEFT)