You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-05-01-001-scripts-audi...

5.4 KiB

title type status date
Scripts Directory Audit and Cleanup Plan refactor active 2026-05-01

Scripts Directory Audit and Cleanup Plan

Overview

The scripts/ directory contains 20 Python files (~4,900 lines total). Many are one-off diagnostics, research utilities, or data backfill scripts from early pipeline development. Several are no longer needed, some generate outputs to now-deleted directories, and a few have overlapping functionality. This plan establishes a clear taxonomy and cleanup path.


Current Inventory

Script Lines Last Commit References Status
download_past_year.py 295 2026-04-30 11 Keep — Active data ingestion
health_check.py 98 2026-05-01 21 Keep — Active health check CLI
validate_svd_themes.py 343 2026-04-30 13 Keep — Active validation
generate_svd_json.py 594 2026-04-13 12 Keep — Generates thoughts/explorer/top_svd_top_motions.json
motion_drift.py 1,207 2026-04-05 42 Keep — Referenced in active plans
sync_motion_content.py 704 2026-03-23 8 Keep — Content enrichment pipeline
rerun_embeddings.py 233 2026-03-23 15 Keep — Embedding rebuild utility
derive_svd_labels.py 423 2026-04-13 5 Keep — SVD label derivation
diagnose_trajectories_cli.py 234 2026-03-31 5 Keep — Diagnostic utility
svd_diagnostics.py 214 2026-03-22 9 Keep — SVD diagnostics
recompute_svd.py 172 2026-04-16 2 Archive — One-off recompute
semantic_gravity_examples.py 286 2026-04-05 6 Archive — Research script
qa_similarity.py 150 2026-03-23 4 Archive — QA script (references deleted thoughts/ledgers/)
fill_mp_votes_parties.py 277 2026-03-22 2 Archive — Backfill script
inspect_axis.py 137 2026-03-22 3 Archive — Diagnostic
compare_svd_exclude_parties.py 204 2026-03-22 1 Archive — Diagnostic
generate_compass.py 157 2026-03-22 2 Archive — Generates to deleted outputs/
compute_test_batch.py 128 2026-03-20 3 Archive — Test batch
generate_extra_charts.py 172 2026-03-22 0 Delete — Generates to deleted outputs/, 0 references

Categorization Rules

Keep (10 scripts)

Scripts that are:

  • Imported or invoked by active code/tests
  • Referenced in active plans (docs/plans/)
  • Run regularly as part of pipeline or diagnostics
  • Updated recently (April 2026+)

Archive (9 scripts)

Scripts that are:

  • One-off diagnostics or backfill utilities
  • Research/exploration scripts with no active plan references
  • Superseded by pipeline code but kept for historical reference
  • Generate outputs to outputs/ (deleted) or thoughts/ledgers/ (deleted)

Archive location: scripts/archive/ — not imported, not tested, preserved for reference.

Delete (1 script)

Scripts that are:

  • Completely orphaned (0 references)
  • Superseded with no unique value
  • Generate outputs to non-existent directories

Implementation Units

  • U1. Create scripts/archive/ directory

    • Files: scripts/archive/ (new directory)
    • Verification: Directory exists
  • U2. Move archive scripts to scripts/archive/

    • Files to move:
      • scripts/recompute_svd.py
      • scripts/semantic_gravity_examples.py
      • scripts/qa_similarity.py
      • scripts/fill_mp_votes_parties.py
      • scripts/inspect_axis.py
      • scripts/compare_svd_exclude_parties.py
      • scripts/generate_compass.py
      • scripts/compute_test_batch.py
    • Verification: Scripts are in scripts/archive/, not in scripts/
  • U3. Delete orphaned scripts

    • Files to delete:
      • scripts/generate_extra_charts.py
    • Verification: File no longer exists
  • U4. Update .gitignore for archive

    • Add: scripts/archive/ (optional — if we don't want to track archived scripts)
    • Or add README in archive explaining purpose
    • Verification: Archive is handled appropriately
  • U5. Run test suite

    • Command: uv run pytest tests/ -q
    • Verification: All tests pass, no import errors from moved scripts

Risks

Risk Mitigation
A test imports an archived script Check all test imports before moving
A plan references an archived script Plans already checked — none reference archive candidates exclusively
Future need for archived script Git history preserves everything; archive is just convenience

Post-Cleanup State

scripts/
├── archive/              # 8 archived scripts (reference only)
│   ├── compare_svd_exclude_parties.py
│   ├── compute_test_batch.py
│   ├── fill_mp_votes_parties.py
│   ├── generate_compass.py
│   ├── inspect_axis.py
│   ├── qa_similarity.py
│   ├── recompute_svd.py
│   └── semantic_gravity_examples.py
├── download_past_year.py
├── health_check.py
├── derive_svd_labels.py
├── diagnose_trajectories_cli.py
├── generate_svd_json.py
├── motion_drift.py
├── rerun_embeddings.py
├── sync_motion_content.py
├── svd_diagnostics.py
└── validate_svd_themes.py

Result: 10 active scripts + 8 archived. ~1,700 lines removed from active directory.