You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
18 lines
1.0 KiB
18 lines
1.0 KiB
# System overview
|
|
|
|
This project is a Streamlit-based UI and data-processing pipeline that computes embeddings,
|
|
performs SVD over MP/motion voting matrices, fuses vector representations, and precomputes
|
|
a similarity cache for quick lookup in the UI.
|
|
|
|
Key subsystems:
|
|
- UI: Streamlit pages (Home.py, pages/*). Exposes interactive explorer and quizzes.
|
|
- Data ingestion: scripts and scraper/api_client.py (Tweede Kamer OData).
|
|
- Processing pipelines: pipeline/* (text embeddings, SVD, fusion).
|
|
- Similarity layer: similarity/compute.py and similarity/lookup.py storing precomputed neighbors.
|
|
- Storage: DuckDB (primary), with a JSON-file fallback used in tests/environments without duckdb.
|
|
- AI/Embedding provider: ai_provider.py (HTTP wrapper around an OpenRouter/OpenAI-compatible API).
|
|
|
|
Operational notes:
|
|
- Dockerfile exists; Streamlit default port 8501 exposed.
|
|
- Tests use pytest. CI uses Drone (.drone.yml).
|
|
- There is no lockfile present in the repository snapshot; add one (poetry.lock or requirements.txt) for reproducible installs.
|
|
|