You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
motief/docs/plans/2026-05-01-001-scripts-audi...

137 lines
5.4 KiB

---
title: Scripts Directory Audit and Cleanup Plan
type: refactor
status: active
date: 2026-05-01
---
# Scripts Directory Audit and Cleanup Plan
## Overview
The `scripts/` directory contains 20 Python files (~4,900 lines total). Many are one-off diagnostics, research utilities, or data backfill scripts from early pipeline development. Several are no longer needed, some generate outputs to now-deleted directories, and a few have overlapping functionality. This plan establishes a clear taxonomy and cleanup path.
---
## Current Inventory
| Script | Lines | Last Commit | References | Status |
|--------|-------|-------------|------------|--------|
| `download_past_year.py` | 295 | 2026-04-30 | 11 | **Keep** — Active data ingestion |
| `health_check.py` | 98 | 2026-05-01 | 21 | **Keep** — Active health check CLI |
| `validate_svd_themes.py` | 343 | 2026-04-30 | 13 | **Keep** — Active validation |
| `generate_svd_json.py` | 594 | 2026-04-13 | 12 | **Keep** — Generates `thoughts/explorer/top_svd_top_motions.json` |
| `motion_drift.py` | 1,207 | 2026-04-05 | 42 | **Keep** — Referenced in active plans |
| `sync_motion_content.py` | 704 | 2026-03-23 | 8 | **Keep** — Content enrichment pipeline |
| `rerun_embeddings.py` | 233 | 2026-03-23 | 15 | **Keep** — Embedding rebuild utility |
| `derive_svd_labels.py` | 423 | 2026-04-13 | 5 | **Keep** — SVD label derivation |
| `diagnose_trajectories_cli.py` | 234 | 2026-03-31 | 5 | **Keep** — Diagnostic utility |
| `svd_diagnostics.py` | 214 | 2026-03-22 | 9 | **Keep** — SVD diagnostics |
| `recompute_svd.py` | 172 | 2026-04-16 | 2 | **Archive** — One-off recompute |
| `semantic_gravity_examples.py` | 286 | 2026-04-05 | 6 | **Archive** — Research script |
| `qa_similarity.py` | 150 | 2026-03-23 | 4 | **Archive** — QA script (references deleted `thoughts/ledgers/`) |
| `fill_mp_votes_parties.py` | 277 | 2026-03-22 | 2 | **Archive** — Backfill script |
| `inspect_axis.py` | 137 | 2026-03-22 | 3 | **Archive** — Diagnostic |
| `compare_svd_exclude_parties.py` | 204 | 2026-03-22 | 1 | **Archive** — Diagnostic |
| `generate_compass.py` | 157 | 2026-03-22 | 2 | **Archive** — Generates to deleted `outputs/` |
| `compute_test_batch.py` | 128 | 2026-03-20 | 3 | **Archive** — Test batch |
| `generate_extra_charts.py` | 172 | 2026-03-22 | 0 | **Delete** — Generates to deleted `outputs/`, 0 references |
---
## Categorization Rules
### Keep (10 scripts)
Scripts that are:
- Imported or invoked by active code/tests
- Referenced in active plans (docs/plans/)
- Run regularly as part of pipeline or diagnostics
- Updated recently (April 2026+)
### Archive (9 scripts)
Scripts that are:
- One-off diagnostics or backfill utilities
- Research/exploration scripts with no active plan references
- Superseded by pipeline code but kept for historical reference
- Generate outputs to `outputs/` (deleted) or `thoughts/ledgers/` (deleted)
**Archive location:** `scripts/archive/` — not imported, not tested, preserved for reference.
### Delete (1 script)
Scripts that are:
- Completely orphaned (0 references)
- Superseded with no unique value
- Generate outputs to non-existent directories
---
## Implementation Units
- [ ] U1. **Create `scripts/archive/` directory**
- Files: `scripts/archive/` (new directory)
- Verification: Directory exists
- [ ] U2. **Move archive scripts to `scripts/archive/`**
- Files to move:
- `scripts/recompute_svd.py`
- `scripts/semantic_gravity_examples.py`
- `scripts/qa_similarity.py`
- `scripts/fill_mp_votes_parties.py`
- `scripts/inspect_axis.py`
- `scripts/compare_svd_exclude_parties.py`
- `scripts/generate_compass.py`
- `scripts/compute_test_batch.py`
- Verification: Scripts are in `scripts/archive/`, not in `scripts/`
- [ ] U3. **Delete orphaned scripts**
- Files to delete:
- `scripts/generate_extra_charts.py`
- Verification: File no longer exists
- [ ] U4. **Update `.gitignore` for archive**
- Add: `scripts/archive/` (optional — if we don't want to track archived scripts)
- Or add README in archive explaining purpose
- Verification: Archive is handled appropriately
- [ ] U5. **Run test suite**
- Command: `uv run pytest tests/ -q`
- Verification: All tests pass, no import errors from moved scripts
---
## Risks
| Risk | Mitigation |
|------|-----------|
| A test imports an archived script | Check all test imports before moving |
| A plan references an archived script | Plans already checked — none reference archive candidates exclusively |
| Future need for archived script | Git history preserves everything; archive is just convenience |
---
## Post-Cleanup State
```
scripts/
├── archive/ # 8 archived scripts (reference only)
│ ├── compare_svd_exclude_parties.py
│ ├── compute_test_batch.py
│ ├── fill_mp_votes_parties.py
│ ├── generate_compass.py
│ ├── inspect_axis.py
│ ├── qa_similarity.py
│ ├── recompute_svd.py
│ └── semantic_gravity_examples.py
├── download_past_year.py
├── health_check.py
├── derive_svd_labels.py
├── diagnose_trajectories_cli.py
├── generate_svd_json.py
├── motion_drift.py
├── rerun_embeddings.py
├── sync_motion_content.py
├── svd_diagnostics.py
└── validate_svd_themes.py
```
**Result:** 10 active scripts + 8 archived. ~1,700 lines removed from active directory.