You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
22 lines
1.5 KiB
22 lines
1.5 KiB
# Domain Glossary
|
|
|
|
## Rules
|
|
- Use consistent domain terms across code and DB: Motion, MP, Party, embedding, window, svd_vector, fused_embedding, similarity_cache, session_id.
|
|
|
|
## Terms
|
|
- Motion: parliamentary motion stored in `motions` table. Evidence: database.py CREATE TABLE motions (file: database.py lines ~40-110)
|
|
- MP (Member of Parliament): individual with votes stored in `mp_votes`. Evidence: database.py CREATE TABLE mp_votes
|
|
- Embedding: text embedding stored in `embeddings` table; fused vectors in `fused_embeddings`.
|
|
- SVD vector: reduced-dimensional vectors stored in `svd_vectors` table.
|
|
- Window: time window identifier (e.g., "2024-Q1") used across SVD/fusion pipelines. Evidence: pipeline/run_pipeline.py _generate_windows
|
|
- Controversy score: derived field stored on motions as controversy_score. Evidence: database.py insert_motion sets controversy_score
|
|
|
|
## Examples / Usage
|
|
- pipeline.run_pipeline._generate_windows produces window ids used when storing svd_vectors and fused_embeddings. Evidence: pipeline/run_pipeline.py lines ~1-120
|
|
|
|
## Evidence pointers
|
|
- database.py: motions, mp_votes, embeddings, fused_embeddings tables (file: database.py)
|
|
- pipeline/run_pipeline.py: window generation and pipeline phases (file: pipeline/run_pipeline.py)
|
|
|
|
## Anti-patterns
|
|
- Inconsistent naming of domain terms across modules (e.g., `mp_vote_parties` vs `mp_votes` usage in database.insert_motion and pipeline extraction). Prefer canonical names matching DB columns and use small adapter functions when transitioning representations.
|
|
|