13 KiB

Raw Blame History

title	type	status	date	origin
Refine axis stability with regression weights and overtone shift	refactor	active	2026-04-05	docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md

Refine Axis Stability with Regression Weights and Overtone Shift

Overview

Replace the current axis stability computation (party-based sign consistency) with a regression-based approach that measures whether the semantic features defining each SVD axis remain stable across windows. Add overtone shift analysis to detect when motion content changes even if party ordering stays the same.

Problem Frame

The current stability metric only checks whether left/right parties score on the expected side of each axis. This misses two important questions:

Axis stability: Does axis 1 capture the same underlying theme in 2019 and 2024? (e.g., "social vs individual" should be stable even if specific motions change)
Overtone shift: Are motions on axis 1 becoming more about migration and less about economics over time, even if PVV still scores higher than SP?

The current approach found zero stable axes because it measured party sign consistency, not semantic stability.

Requirements Trace

R1. Compute semantic stability via Ridge regression weights across windows (replaces party sign consistency)
R2. Generate stability heatmap showing which axes are semantically comparable across time
R3. Detect axis reordering — cases where axis N in window A ≈ axis M in window B
R4. Flag unstable axes where semantic signature changes significantly
R5. For each stable axis, compute semantic gravity (weighted mean fused embedding) per window
R6. Track overtone shift: how semantic gravity moves across windows
R7. Identify inflection points where overtone shift accelerated
R8. Show example motions and top shifting dimensions at inflection points
R9-R12. Party voting analysis (unchanged from existing implementation)
R13-R15. Output and parameterization (unchanged)

Scope Boundaries

Refine existing scripts/motion_drift.py — no new script
Keep party voting analysis and report generation (already working)
Annual windows only; quarterly too sparse
Ridge regression with scikit-learn (already in dependencies)

Context & Research

Existing Code

scripts/motion_drift.py — current implementation with party-based fallback stability
analysis/clustering.py — UMAP + KMeans infrastructure (not directly used but shows pattern)
scikit-learn>=1.8.0 — already in pyproject.toml, provides Ridge

Key Technical Decisions

Ridge regression per axis per window: Fit SVD_score ~ fused_embedding for each axis. The weight vector (2610 dims) is the semantic signature. Compare via cosine similarity across windows.
Semantic gravity for overtone shift: Weighted mean fused embedding of all motions, weighted by absolute SVD score on the axis. Track how gravity moves across windows.
Top-K dimensions for interpretation: Extract top-50 dimensions by absolute regression weight. Project gravity onto these to identify which semantic features are shifting.
Party-based fallback kept: For windows with too few motions for regression (< 50), fall back to party sign consistency.

Open Questions

Resolved During Planning

Regression type: Ridge (L2 regularization) — handles 2610-dim vectors without overfitting, already available via scikit-learn.
Alpha (regularization strength): Default 1.0, parameterized via --regression-alpha. Will test 0.1, 1.0, 10.0 during execution.
Top-K dimensions for interpretation: K=50 — enough to capture semantic signal without noise.
Overtone shift metric: Cosine distance between semantic gravity points across consecutive windows. Threshold for inflection: 2× median shift rate.

Deferred to Implementation

Optimal alpha for Ridge regression — will test against real data and pick value that gives most interpretable weight vectors
Whether to normalize fused embeddings before regression (likely yes, since SVD dims are ~1-100 scale and text dims are ~0-1)

High-Level Technical Design

This illustrates the intended approach and is directional guidance for review, not implementation specification.

┌─────────────────────────────────────────────────────────────────┐
│              Refined Axis Stability + Overtone Shift             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Per-Axis Ridge Regression (per window)                       │
│     ├── For each SVD axis k:                                     │
│     │   X = fused_embeddings (n_motions × 2610)                 │
│     │   y = SVD scores on axis k (n_motions)                    │
│     │   w_k = Ridge.fit(X, y).coef_  (2610-dim weight vector)   │
│     └── Output: weight_vectors[window][axis]                     │
│                                                                  │
│  2. Stability Matrix                                              │
│     ├── For each axis k, compute cosine similarity of w_k        │
│     │   across all window pairs                                  │
│     └── Output: stability_matrix[window][window][axis]           │
│                                                                  │
│  3. Overtone Shift                                                │
│     ├── For each axis k and window:                              │
│     │   gravity_k = weighted_mean(fused_embeddings,              │
│     │              weights=abs(SVD_scores_k))                    │
│     │   shift_k = cosine_distance(gravity_k[t], gravity_k[t+1])  │
│     └── Output: shift_series[axis] = [shift values per window]   │
│                                                                  │
│  4. Interpretation                                                │
│     ├── Top-50 dimensions per axis (by |weight|)                 │
│     ├── Project gravity onto top dimensions to see shifts        │
│     └── Report: "Axis 1 stable (0.82), overtone shift (0.45)    │
│              — migration framing gained +0.31, economic -0.22"   │
└─────────────────────────────────────────────────────────────────┘

Implementation Units

Unit 1: Add Ridge regression-based stability computation

Goal: Replace compute_axis_stability() with regression-based version.

Requirements: R1, R2, R3, R4

Dependencies: None (replaces existing function)

Files:

Modify: scripts/motion_drift.py (replace compute_axis_stability)
Modify: tests/test_motion_drift.py (update stability tests)

Approach:

New compute_axis_stability() function:
- For each window, load motion scores + fused embeddings
- For each axis k (1-10), fit Ridge regression: score_k ~ fused_embedding
- Normalize features before fitting (StandardScaler on fused embeddings)
- Extract weight vector w_k (2610 dims)
- Compute pairwise cosine similarity of w_k across windows
- Return stability matrix, stable/reordered/unstable axes
Keep _compute_stability_fallback() for windows with < 50 motions
Add --regression-alpha CLI argument (default 1.0)

Patterns to follow:

sklearn.linear_model.Ridge — standard usage: Ridge(alpha=alpha).fit(X, y)
sklearn.preprocessing.StandardScaler — normalize features before regression

Test scenarios:

Happy path: regression produces weight vectors with cosine similarity in [-1, 1]
Happy path: synthetic data with known semantic signatures recovers stable axes
Edge case: window with < 50 motions falls back to party-based method
Edge case: all motions have same score on axis (degenerate case)
Integration: run against real data, verify stability values are non-zero

Verification:

Stability matrix has correct shape (n_windows × n_windows × n_components)
At least some axes show stability > 0.5 on real data
Fallback triggers correctly for sparse windows
Unit 2: Add overtone shift analysis

Goal: Compute semantic gravity trajectories and detect overtone shifts.

Requirements: R5, R6, R7, R8

Dependencies: Unit 1 (needs regression weight vectors for top-K dimension interpretation; shift computation itself is independent)

Files:

Create: compute_overtone_shift() function in scripts/motion_drift.py
Modify: scripts/motion_drift.py (call overtone shift in main)
Modify: tests/test_motion_drift.py (add overtone shift tests)

Approach:

New compute_overtone_shift(db_path, stable_axes, windows, top_k=50) function:
- For each stable axis and window:
  - Load motion scores and fused embeddings
  - Compute semantic gravity: weighted mean of fused embeddings, weights = abs(SVD scores)
  - Extract top-K dimensions by absolute regression weight
  - Project gravity onto top-K dimensions
- Compute cosine distance between consecutive window gravity points
- Detect inflection points: shift > 2× median shift rate
- For each inflection, identify top shifting dimensions and example motions
Return shift series, inflection points, dimension-level analysis

Test scenarios:

Happy path: overtone shift returns shift series for each stable axis
Happy path: synthetic data with known shift detects inflection point
Edge case: axis with only 2 windows returns shift but no inflection points
Edge case: monotonic shift returns no inflection points
Integration: run against real data, verify shift values are plausible

Verification:

Shift series has correct length (n_windows - 1 per axis)
Inflection points (if any) include dimension-level analysis
Top shifting dimensions are reported with direction and magnitude
Unit 3: Update report generation with new metrics

Goal: Update report to show both stability and overtone shift per axis.

Requirements: R13, R14

Dependencies: Units 1, 2

Files:

Modify: scripts/motion_drift.py (_generate_report function)
Modify: tests/test_motion_drift.py (update report tests)

Approach:

Update _generate_report() to include:
- Stability heatmap (regression weight similarity)
- Overtone shift timeline per axis (line chart with inflection markers)
- For each stable axis: stability score + overtone shift magnitude
- Top shifting dimensions table: dimension index, direction, magnitude
- Example motions at inflection points
Keep existing party voting analysis section unchanged

Test scenarios:

Happy path: report includes both stability and overtone shift sections
Happy path: all charts generated and embedded
Edge case: no stable axes → report notes this, skips overtone shift

Verification:

Report contains stability heatmap, shift timelines, and dimension analysis
All PNG files exist in output directory

System-Wide Impact

Interaction graph: Replaces compute_axis_stability() — callers (main function) unchanged API
Unchanged invariants: Party voting analysis, report structure, CLI interface
New dependency: None — scikit-learn already in dependencies

Risks & Dependencies

Risk	Likelihood	Impact	Mitigation
Ridge regression overfits with 2610 features	Medium	Medium	Use Ridge (L2 regularization), test multiple alpha values, validate with cross-validation
Fused embeddings have different dimensions across windows	Low	Low	Already handled — truncate to min dimension
Regression takes too long on full dataset	Medium	Low	9 windows × 10 axes = 90 Ridge fits. Each fit on ~3000×2610 matrix ~0.1s with sklearn. Total ~9s. Acceptable.
Weight vectors are hard to interpret	Medium	Low	Focus on top-50 dimensions, report direction and magnitude clearly

Documentation / Operational Notes

Updated script: scripts/motion_drift.py — new stability metric, new overtone shift analysis
Report output: markdown with stability heatmap, shift timelines, dimension analysis
Existing report sections (party voting) unchanged

Sources & References

Origin document: docs/brainstorms/2026-04-05-motion-semantic-drift-over-time-requirements.md
Related code: scripts/motion_drift.py (existing implementation), analysis/clustering.py (UMAP/KMeans patterns)
Ridge regression: sklearn.linear_model.Ridge

13 KiB Raw Blame History

Refine Axis Stability with Regression Weights and Overtone Shift

Overview

Problem Frame

Requirements Trace

Scope Boundaries

Context & Research

Existing Code

Key Technical Decisions

Open Questions

Resolved During Planning

Deferred to Implementation

High-Level Technical Design

Implementation Units

System-Wide Impact

Risks & Dependencies

Documentation / Operational Notes

Sources & References

13 KiB

Raw Blame History