# Code Clusters / Organization ## Rules - The repository organizes code into the following clusters (observed): - UI / Streamlit: Home.py, pages/, app.py, explorer.py - Database & persistence: database.py, config.py - ETL / pipeline: pipeline/ (run_pipeline.py, svd_pipeline, text_pipeline, fusion) - AI provider & summarization: ai_provider.py, pipeline/..., analysis/ - Similarity & caching: similarity/*, similarity_cache table in DB - API client & scraping: api_client.py, pipeline/fetch_mp_metadata - Analysis & visualization: analysis/visualize.py, explorer.py - CLI & scheduler: scheduler.py, pipeline/run_pipeline.py - Tests & migrations: tests/ (pytest) and database reset helpers ## Examples ### Pipeline orchestrator (cluster: CLI & pipeline) ```python from database import MotionDatabase db = MotionDatabase(db_path) # then phases: fetch_mp_metadata, extract_mp_votes, compute svd, ensure_text_embeddings, fuse_for_window ``` ## Remediations - Add a brief CONTRIBUTING.md describing where to add new pipeline stages and how to run tests locally. Include notes about optional duckdb dependency and JSON fallback for tests. ## Evidence pointers - pipeline/run_pipeline.py: orchestrator and cluster boundaries (file: pipeline/run_pipeline.py) - ai_provider.py: AI adapter for embeddings and chat (file: ai_provider.py) - analysis/visualize.py: visualization cluster (file: analysis/visualize.py)