You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
5.4 KiB
5.4 KiB
| title | type | status | date |
|---|---|---|---|
| Scripts Directory Audit and Cleanup Plan | refactor | active | 2026-05-01 |
Scripts Directory Audit and Cleanup Plan
Overview
The scripts/ directory contains 20 Python files (~4,900 lines total). Many are one-off diagnostics, research utilities, or data backfill scripts from early pipeline development. Several are no longer needed, some generate outputs to now-deleted directories, and a few have overlapping functionality. This plan establishes a clear taxonomy and cleanup path.
Current Inventory
| Script | Lines | Last Commit | References | Status |
|---|---|---|---|---|
download_past_year.py |
295 | 2026-04-30 | 11 | Keep — Active data ingestion |
health_check.py |
98 | 2026-05-01 | 21 | Keep — Active health check CLI |
validate_svd_themes.py |
343 | 2026-04-30 | 13 | Keep — Active validation |
generate_svd_json.py |
594 | 2026-04-13 | 12 | Keep — Generates thoughts/explorer/top_svd_top_motions.json |
motion_drift.py |
1,207 | 2026-04-05 | 42 | Keep — Referenced in active plans |
sync_motion_content.py |
704 | 2026-03-23 | 8 | Keep — Content enrichment pipeline |
rerun_embeddings.py |
233 | 2026-03-23 | 15 | Keep — Embedding rebuild utility |
derive_svd_labels.py |
423 | 2026-04-13 | 5 | Keep — SVD label derivation |
diagnose_trajectories_cli.py |
234 | 2026-03-31 | 5 | Keep — Diagnostic utility |
svd_diagnostics.py |
214 | 2026-03-22 | 9 | Keep — SVD diagnostics |
recompute_svd.py |
172 | 2026-04-16 | 2 | Archive — One-off recompute |
semantic_gravity_examples.py |
286 | 2026-04-05 | 6 | Archive — Research script |
qa_similarity.py |
150 | 2026-03-23 | 4 | Archive — QA script (references deleted thoughts/ledgers/) |
fill_mp_votes_parties.py |
277 | 2026-03-22 | 2 | Archive — Backfill script |
inspect_axis.py |
137 | 2026-03-22 | 3 | Archive — Diagnostic |
compare_svd_exclude_parties.py |
204 | 2026-03-22 | 1 | Archive — Diagnostic |
generate_compass.py |
157 | 2026-03-22 | 2 | Archive — Generates to deleted outputs/ |
compute_test_batch.py |
128 | 2026-03-20 | 3 | Archive — Test batch |
generate_extra_charts.py |
172 | 2026-03-22 | 0 | Delete — Generates to deleted outputs/, 0 references |
Categorization Rules
Keep (10 scripts)
Scripts that are:
- Imported or invoked by active code/tests
- Referenced in active plans (docs/plans/)
- Run regularly as part of pipeline or diagnostics
- Updated recently (April 2026+)
Archive (9 scripts)
Scripts that are:
- One-off diagnostics or backfill utilities
- Research/exploration scripts with no active plan references
- Superseded by pipeline code but kept for historical reference
- Generate outputs to
outputs/(deleted) orthoughts/ledgers/(deleted)
Archive location: scripts/archive/ — not imported, not tested, preserved for reference.
Delete (1 script)
Scripts that are:
- Completely orphaned (0 references)
- Superseded with no unique value
- Generate outputs to non-existent directories
Implementation Units
-
U1. Create
scripts/archive/directory- Files:
scripts/archive/(new directory) - Verification: Directory exists
- Files:
-
U2. Move archive scripts to
scripts/archive/- Files to move:
scripts/recompute_svd.pyscripts/semantic_gravity_examples.pyscripts/qa_similarity.pyscripts/fill_mp_votes_parties.pyscripts/inspect_axis.pyscripts/compare_svd_exclude_parties.pyscripts/generate_compass.pyscripts/compute_test_batch.py
- Verification: Scripts are in
scripts/archive/, not inscripts/
- Files to move:
-
U3. Delete orphaned scripts
- Files to delete:
scripts/generate_extra_charts.py
- Verification: File no longer exists
- Files to delete:
-
U4. Update
.gitignorefor archive- Add:
scripts/archive/(optional — if we don't want to track archived scripts) - Or add README in archive explaining purpose
- Verification: Archive is handled appropriately
- Add:
-
U5. Run test suite
- Command:
uv run pytest tests/ -q - Verification: All tests pass, no import errors from moved scripts
- Command:
Risks
| Risk | Mitigation |
|---|---|
| A test imports an archived script | Check all test imports before moving |
| A plan references an archived script | Plans already checked — none reference archive candidates exclusively |
| Future need for archived script | Git history preserves everything; archive is just convenience |
Post-Cleanup State
scripts/
├── archive/ # 8 archived scripts (reference only)
│ ├── compare_svd_exclude_parties.py
│ ├── compute_test_batch.py
│ ├── fill_mp_votes_parties.py
│ ├── generate_compass.py
│ ├── inspect_axis.py
│ ├── qa_similarity.py
│ ├── recompute_svd.py
│ └── semantic_gravity_examples.py
├── download_past_year.py
├── health_check.py
├── derive_svd_labels.py
├── diagnose_trajectories_cli.py
├── generate_svd_json.py
├── motion_drift.py
├── rerun_embeddings.py
├── sync_motion_content.py
├── svd_diagnostics.py
└── validate_svd_themes.py
Result: 10 active scripts + 8 archived. ~1,700 lines removed from active directory.