# Architecture overview and confidence levels

layers:
  - name: ui
    description: "Streamlit pages and app entrypoints (Home.py, pages/*)."
    confidence: high
  - name: ingestion
    description: "API client and scrapers (api_client.py, scraper.py)."
    confidence: high
  - name: processing
    description: "Pipelines for embeddings, SVD, fusion (pipeline/*, similarity/*)."
    confidence: high
  - name: storage
    description: "DuckDB primary store; JSON fallback used in tests when duckdb missing."
    confidence: high
  - name: ai_provider
    description: "Lightweight HTTP wrapper around OpenRouter/OpenAI-style backends in ai_provider.py."
    confidence: medium
  - name: orchestration
    description: "Script-based orchestration (scripts/*.py), rerun_embeddings, scheduler."
    confidence: medium

organization:
  - Keep UI code separated from heavy compute — Streamlit runs should avoid heavy compute inline (use subprocess or schedule).
  - Pipelines are implemented as re-entrant functions returning summary dicts to facilitate testing and subprocess usage (seen in svd_pipeline.compute_svd_for_window).
  - DB access is centralised via MotionDatabase helper (database.py) with convenience methods (store_fused_embedding, append_audit_event).

design_decisions:
  - Use DuckDB for local fast analytics storage; read_only connections used in compute stages to allow parallel workers.
  - Embeddings and similarity cache are stored as JSON in DuckDB tables (vector columns).
  - The ai_provider uses requests with retry/backoff rather than a heavy SDK to keep testing simple.

confidence_summary:
  overall_confidence: high
  notes: "Phase 1 input inspected files across the repo; design mapping is consistent with code samples."