You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
137 lines
5.4 KiB
137 lines
5.4 KiB
---
|
|
title: Scripts Directory Audit and Cleanup Plan
|
|
type: refactor
|
|
status: active
|
|
date: 2026-05-01
|
|
---
|
|
|
|
# Scripts Directory Audit and Cleanup Plan
|
|
|
|
## Overview
|
|
|
|
The `scripts/` directory contains 20 Python files (~4,900 lines total). Many are one-off diagnostics, research utilities, or data backfill scripts from early pipeline development. Several are no longer needed, some generate outputs to now-deleted directories, and a few have overlapping functionality. This plan establishes a clear taxonomy and cleanup path.
|
|
|
|
---
|
|
|
|
## Current Inventory
|
|
|
|
| Script | Lines | Last Commit | References | Status |
|
|
|--------|-------|-------------|------------|--------|
|
|
| `download_past_year.py` | 295 | 2026-04-30 | 11 | **Keep** — Active data ingestion |
|
|
| `health_check.py` | 98 | 2026-05-01 | 21 | **Keep** — Active health check CLI |
|
|
| `validate_svd_themes.py` | 343 | 2026-04-30 | 13 | **Keep** — Active validation |
|
|
| `generate_svd_json.py` | 594 | 2026-04-13 | 12 | **Keep** — Generates `thoughts/explorer/top_svd_top_motions.json` |
|
|
| `motion_drift.py` | 1,207 | 2026-04-05 | 42 | **Keep** — Referenced in active plans |
|
|
| `sync_motion_content.py` | 704 | 2026-03-23 | 8 | **Keep** — Content enrichment pipeline |
|
|
| `rerun_embeddings.py` | 233 | 2026-03-23 | 15 | **Keep** — Embedding rebuild utility |
|
|
| `derive_svd_labels.py` | 423 | 2026-04-13 | 5 | **Keep** — SVD label derivation |
|
|
| `diagnose_trajectories_cli.py` | 234 | 2026-03-31 | 5 | **Keep** — Diagnostic utility |
|
|
| `svd_diagnostics.py` | 214 | 2026-03-22 | 9 | **Keep** — SVD diagnostics |
|
|
| `recompute_svd.py` | 172 | 2026-04-16 | 2 | **Archive** — One-off recompute |
|
|
| `semantic_gravity_examples.py` | 286 | 2026-04-05 | 6 | **Archive** — Research script |
|
|
| `qa_similarity.py` | 150 | 2026-03-23 | 4 | **Archive** — QA script (references deleted `thoughts/ledgers/`) |
|
|
| `fill_mp_votes_parties.py` | 277 | 2026-03-22 | 2 | **Archive** — Backfill script |
|
|
| `inspect_axis.py` | 137 | 2026-03-22 | 3 | **Archive** — Diagnostic |
|
|
| `compare_svd_exclude_parties.py` | 204 | 2026-03-22 | 1 | **Archive** — Diagnostic |
|
|
| `generate_compass.py` | 157 | 2026-03-22 | 2 | **Archive** — Generates to deleted `outputs/` |
|
|
| `compute_test_batch.py` | 128 | 2026-03-20 | 3 | **Archive** — Test batch |
|
|
| `generate_extra_charts.py` | 172 | 2026-03-22 | 0 | **Delete** — Generates to deleted `outputs/`, 0 references |
|
|
|
|
---
|
|
|
|
## Categorization Rules
|
|
|
|
### Keep (10 scripts)
|
|
Scripts that are:
|
|
- Imported or invoked by active code/tests
|
|
- Referenced in active plans (docs/plans/)
|
|
- Run regularly as part of pipeline or diagnostics
|
|
- Updated recently (April 2026+)
|
|
|
|
### Archive (9 scripts)
|
|
Scripts that are:
|
|
- One-off diagnostics or backfill utilities
|
|
- Research/exploration scripts with no active plan references
|
|
- Superseded by pipeline code but kept for historical reference
|
|
- Generate outputs to `outputs/` (deleted) or `thoughts/ledgers/` (deleted)
|
|
|
|
**Archive location:** `scripts/archive/` — not imported, not tested, preserved for reference.
|
|
|
|
### Delete (1 script)
|
|
Scripts that are:
|
|
- Completely orphaned (0 references)
|
|
- Superseded with no unique value
|
|
- Generate outputs to non-existent directories
|
|
|
|
---
|
|
|
|
## Implementation Units
|
|
|
|
- [ ] U1. **Create `scripts/archive/` directory**
|
|
- Files: `scripts/archive/` (new directory)
|
|
- Verification: Directory exists
|
|
|
|
- [ ] U2. **Move archive scripts to `scripts/archive/`**
|
|
- Files to move:
|
|
- `scripts/recompute_svd.py`
|
|
- `scripts/semantic_gravity_examples.py`
|
|
- `scripts/qa_similarity.py`
|
|
- `scripts/fill_mp_votes_parties.py`
|
|
- `scripts/inspect_axis.py`
|
|
- `scripts/compare_svd_exclude_parties.py`
|
|
- `scripts/generate_compass.py`
|
|
- `scripts/compute_test_batch.py`
|
|
- Verification: Scripts are in `scripts/archive/`, not in `scripts/`
|
|
|
|
- [ ] U3. **Delete orphaned scripts**
|
|
- Files to delete:
|
|
- `scripts/generate_extra_charts.py`
|
|
- Verification: File no longer exists
|
|
|
|
- [ ] U4. **Update `.gitignore` for archive**
|
|
- Add: `scripts/archive/` (optional — if we don't want to track archived scripts)
|
|
- Or add README in archive explaining purpose
|
|
- Verification: Archive is handled appropriately
|
|
|
|
- [ ] U5. **Run test suite**
|
|
- Command: `uv run pytest tests/ -q`
|
|
- Verification: All tests pass, no import errors from moved scripts
|
|
|
|
---
|
|
|
|
## Risks
|
|
|
|
| Risk | Mitigation |
|
|
|------|-----------|
|
|
| A test imports an archived script | Check all test imports before moving |
|
|
| A plan references an archived script | Plans already checked — none reference archive candidates exclusively |
|
|
| Future need for archived script | Git history preserves everything; archive is just convenience |
|
|
|
|
---
|
|
|
|
## Post-Cleanup State
|
|
|
|
```
|
|
scripts/
|
|
├── archive/ # 8 archived scripts (reference only)
|
|
│ ├── compare_svd_exclude_parties.py
|
|
│ ├── compute_test_batch.py
|
|
│ ├── fill_mp_votes_parties.py
|
|
│ ├── generate_compass.py
|
|
│ ├── inspect_axis.py
|
|
│ ├── qa_similarity.py
|
|
│ ├── recompute_svd.py
|
|
│ └── semantic_gravity_examples.py
|
|
├── download_past_year.py
|
|
├── health_check.py
|
|
├── derive_svd_labels.py
|
|
├── diagnose_trajectories_cli.py
|
|
├── generate_svd_json.py
|
|
├── motion_drift.py
|
|
├── rerun_embeddings.py
|
|
├── sync_motion_content.py
|
|
├── svd_diagnostics.py
|
|
└── validate_svd_themes.py
|
|
```
|
|
|
|
**Result:** 10 active scripts + 8 archived. ~1,700 lines removed from active directory.
|
|
|