--- title: Scripts Directory Audit and Cleanup Plan type: refactor status: active date: 2026-05-01 --- # Scripts Directory Audit and Cleanup Plan ## Overview The `scripts/` directory contains 20 Python files (~4,900 lines total). Many are one-off diagnostics, research utilities, or data backfill scripts from early pipeline development. Several are no longer needed, some generate outputs to now-deleted directories, and a few have overlapping functionality. This plan establishes a clear taxonomy and cleanup path. --- ## Current Inventory | Script | Lines | Last Commit | References | Status | |--------|-------|-------------|------------|--------| | `download_past_year.py` | 295 | 2026-04-30 | 11 | **Keep** — Active data ingestion | | `health_check.py` | 98 | 2026-05-01 | 21 | **Keep** — Active health check CLI | | `validate_svd_themes.py` | 343 | 2026-04-30 | 13 | **Keep** — Active validation | | `generate_svd_json.py` | 594 | 2026-04-13 | 12 | **Keep** — Generates `thoughts/explorer/top_svd_top_motions.json` | | `motion_drift.py` | 1,207 | 2026-04-05 | 42 | **Keep** — Referenced in active plans | | `sync_motion_content.py` | 704 | 2026-03-23 | 8 | **Keep** — Content enrichment pipeline | | `rerun_embeddings.py` | 233 | 2026-03-23 | 15 | **Keep** — Embedding rebuild utility | | `derive_svd_labels.py` | 423 | 2026-04-13 | 5 | **Keep** — SVD label derivation | | `diagnose_trajectories_cli.py` | 234 | 2026-03-31 | 5 | **Keep** — Diagnostic utility | | `svd_diagnostics.py` | 214 | 2026-03-22 | 9 | **Keep** — SVD diagnostics | | `recompute_svd.py` | 172 | 2026-04-16 | 2 | **Archive** — One-off recompute | | `semantic_gravity_examples.py` | 286 | 2026-04-05 | 6 | **Archive** — Research script | | `qa_similarity.py` | 150 | 2026-03-23 | 4 | **Archive** — QA script (references deleted `thoughts/ledgers/`) | | `fill_mp_votes_parties.py` | 277 | 2026-03-22 | 2 | **Archive** — Backfill script | | `inspect_axis.py` | 137 | 2026-03-22 | 3 | **Archive** — Diagnostic | | `compare_svd_exclude_parties.py` | 204 | 2026-03-22 | 1 | **Archive** — Diagnostic | | `generate_compass.py` | 157 | 2026-03-22 | 2 | **Archive** — Generates to deleted `outputs/` | | `compute_test_batch.py` | 128 | 2026-03-20 | 3 | **Archive** — Test batch | | `generate_extra_charts.py` | 172 | 2026-03-22 | 0 | **Delete** — Generates to deleted `outputs/`, 0 references | --- ## Categorization Rules ### Keep (10 scripts) Scripts that are: - Imported or invoked by active code/tests - Referenced in active plans (docs/plans/) - Run regularly as part of pipeline or diagnostics - Updated recently (April 2026+) ### Archive (9 scripts) Scripts that are: - One-off diagnostics or backfill utilities - Research/exploration scripts with no active plan references - Superseded by pipeline code but kept for historical reference - Generate outputs to `outputs/` (deleted) or `thoughts/ledgers/` (deleted) **Archive location:** `scripts/archive/` — not imported, not tested, preserved for reference. ### Delete (1 script) Scripts that are: - Completely orphaned (0 references) - Superseded with no unique value - Generate outputs to non-existent directories --- ## Implementation Units - [ ] U1. **Create `scripts/archive/` directory** - Files: `scripts/archive/` (new directory) - Verification: Directory exists - [ ] U2. **Move archive scripts to `scripts/archive/`** - Files to move: - `scripts/recompute_svd.py` - `scripts/semantic_gravity_examples.py` - `scripts/qa_similarity.py` - `scripts/fill_mp_votes_parties.py` - `scripts/inspect_axis.py` - `scripts/compare_svd_exclude_parties.py` - `scripts/generate_compass.py` - `scripts/compute_test_batch.py` - Verification: Scripts are in `scripts/archive/`, not in `scripts/` - [ ] U3. **Delete orphaned scripts** - Files to delete: - `scripts/generate_extra_charts.py` - Verification: File no longer exists - [ ] U4. **Update `.gitignore` for archive** - Add: `scripts/archive/` (optional — if we don't want to track archived scripts) - Or add README in archive explaining purpose - Verification: Archive is handled appropriately - [ ] U5. **Run test suite** - Command: `uv run pytest tests/ -q` - Verification: All tests pass, no import errors from moved scripts --- ## Risks | Risk | Mitigation | |------|-----------| | A test imports an archived script | Check all test imports before moving | | A plan references an archived script | Plans already checked — none reference archive candidates exclusively | | Future need for archived script | Git history preserves everything; archive is just convenience | --- ## Post-Cleanup State ``` scripts/ ├── archive/ # 8 archived scripts (reference only) │ ├── compare_svd_exclude_parties.py │ ├── compute_test_batch.py │ ├── fill_mp_votes_parties.py │ ├── generate_compass.py │ ├── inspect_axis.py │ ├── qa_similarity.py │ ├── recompute_svd.py │ └── semantic_gravity_examples.py ├── download_past_year.py ├── health_check.py ├── derive_svd_labels.py ├── diagnose_trajectories_cli.py ├── generate_svd_json.py ├── motion_drift.py ├── rerun_embeddings.py ├── sync_motion_content.py ├── svd_diagnostics.py └── validate_svd_themes.py ``` **Result:** 10 active scripts + 8 archived. ~1,700 lines removed from active directory.