You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
30 lines
1.4 KiB
30 lines
1.4 KiB
# Code Clusters / Organization
|
|
|
|
## Rules
|
|
- The repository organizes code into the following clusters (observed):
|
|
- UI / Streamlit: Home.py, pages/, app.py, explorer.py
|
|
- Database & persistence: database.py, config.py
|
|
- ETL / pipeline: pipeline/ (run_pipeline.py, svd_pipeline, text_pipeline, fusion)
|
|
- AI provider & summarization: ai_provider.py, pipeline/..., analysis/
|
|
- Similarity & caching: similarity/*, similarity_cache table in DB
|
|
- API client & scraping: api_client.py, pipeline/fetch_mp_metadata
|
|
- Analysis & visualization: analysis/visualize.py, explorer.py
|
|
- CLI & scheduler: scheduler.py, pipeline/run_pipeline.py
|
|
- Tests & migrations: tests/ (pytest) and database reset helpers
|
|
|
|
## Examples
|
|
|
|
### Pipeline orchestrator (cluster: CLI & pipeline)
|
|
```python
|
|
from database import MotionDatabase
|
|
db = MotionDatabase(db_path)
|
|
# then phases: fetch_mp_metadata, extract_mp_votes, compute svd, ensure_text_embeddings, fuse_for_window
|
|
```
|
|
|
|
## Remediations
|
|
- Add a brief CONTRIBUTING.md describing where to add new pipeline stages and how to run tests locally. Include notes about optional duckdb dependency and JSON fallback for tests.
|
|
|
|
## Evidence pointers
|
|
- pipeline/run_pipeline.py: orchestrator and cluster boundaries (file: pipeline/run_pipeline.py)
|
|
- ai_provider.py: AI adapter for embeddings and chat (file: ai_provider.py)
|
|
- analysis/visualize.py: visualization cluster (file: analysis/visualize.py)
|
|
|