# Code Clusters / Organization

## Rules
- The repository organizes code into the following clusters (observed):
  - UI / Streamlit: Home.py, pages/, app.py, explorer.py
  - Database & persistence: database.py, config.py
  - ETL / pipeline: pipeline/ (run_pipeline.py, svd_pipeline, text_pipeline, fusion)
  - AI provider & summarization: ai_provider.py, pipeline/..., analysis/
  - Similarity & caching: similarity/*, similarity_cache table in DB
  - API client & scraping: api_client.py, pipeline/fetch_mp_metadata
  - Analysis & visualization: analysis/visualize.py, explorer.py
  - CLI & scheduler: scheduler.py, pipeline/run_pipeline.py
  - Tests & migrations: tests/ (pytest) and database reset helpers

## Examples

### Pipeline orchestrator (cluster: CLI & pipeline)
```python
from database import MotionDatabase
db = MotionDatabase(db_path)
# then phases: fetch_mp_metadata, extract_mp_votes, compute svd, ensure_text_embeddings, fuse_for_window
```

## Remediations
- Add a brief CONTRIBUTING.md describing where to add new pipeline stages and how to run tests locally. Include notes about optional duckdb dependency and JSON fallback for tests.

## Evidence pointers
- pipeline/run_pipeline.py: orchestrator and cluster boundaries (file: pipeline/run_pipeline.py)
- ai_provider.py: AI adapter for embeddings and chat (file: ai_provider.py)
- analysis/visualize.py: visualization cluster (file: analysis/visualize.py)