# Architecture overview and confidence levels layers: - name: ui description: "Streamlit pages and app entrypoints (Home.py, pages/*)." confidence: high - name: ingestion description: "API client and scrapers (api_client.py, scraper.py)." confidence: high - name: processing description: "Pipelines for embeddings, SVD, fusion (pipeline/*, similarity/*)." confidence: high - name: storage description: "DuckDB primary store; JSON fallback used in tests when duckdb missing." confidence: high - name: ai_provider description: "Lightweight HTTP wrapper around OpenRouter/OpenAI-style backends in ai_provider.py." confidence: medium - name: orchestration description: "Script-based orchestration (scripts/*.py), rerun_embeddings, scheduler." confidence: medium organization: - Keep UI code separated from heavy compute — Streamlit runs should avoid heavy compute inline (use subprocess or schedule). - Pipelines are implemented as re-entrant functions returning summary dicts to facilitate testing and subprocess usage (seen in svd_pipeline.compute_svd_for_window). - DB access is centralised via MotionDatabase helper (database.py) with convenience methods (store_fused_embedding, append_audit_event). design_decisions: - Use DuckDB for local fast analytics storage; read_only connections used in compute stages to allow parallel workers. - Embeddings and similarity cache are stored as JSON in DuckDB tables (vector columns). - The ai_provider uses requests with retry/backoff rather than a heavy SDK to keep testing simple. confidence_summary: overall_confidence: high notes: "Phase 1 input inspected files across the repo; design mapping is consistent with code samples."