# Database Schema (DuckDB) — extracted DDL ## Rules - Use DuckDB for persistent storage when available; fallback to JSON files when duckdb is not installed (database.py). - Keep schema migrations additive (ALTER TABLE ADD COLUMN IF NOT EXISTS used in database.py). ## Examples (DDL snippets extracted from database.py) ### motions table ```sql CREATE TABLE IF NOT EXISTS motions ( id INTEGER DEFAULT nextval('motions_id_seq'), title TEXT NOT NULL, description TEXT, date DATE, policy_area TEXT, voting_results JSON, winning_margin FLOAT, controversy_score FLOAT, layman_explanation TEXT, externe_identifier TEXT, body_text TEXT, url TEXT UNIQUE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) ``` ### mp_votes table ```sql CREATE TABLE IF NOT EXISTS mp_votes ( id INTEGER DEFAULT nextval('mp_votes_id_seq'), motion_id INTEGER NOT NULL, mp_name TEXT NOT NULL, party TEXT, vote TEXT NOT NULL, date DATE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) ``` ### embeddings / fused_embeddings ```sql CREATE TABLE IF NOT EXISTS embeddings ( id INTEGER DEFAULT nextval('embeddings_id_seq'), motion_id INTEGER NOT NULL, model TEXT, vector JSON NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) CREATE TABLE IF NOT EXISTS fused_embeddings ( id INTEGER DEFAULT nextval('fused_embeddings_id_seq'), motion_id INTEGER NOT NULL, window_id TEXT NOT NULL, vector JSON NOT NULL, svd_dims INTEGER NOT NULL, text_dims INTEGER NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) ``` ## Anti-patterns - Broad try/except around duckdb import (database.py top) — acceptable for optional dependency but should log explicitly the missing dependency and document test behavior. ## Remediations - Add a simple migration/versioning table (schema_version) to track schema changes and apply migrations deterministically. - Add tests that exercise both duckdb-backed and JSON-fallback database paths. Evidence: database.py contains JSON fallback logic (lines ~1-80). ## Evidence pointers - database.py: DDL strings and sequences (file: database.py lines ~1-300 and further). See create table blocks for motions, mp_votes, embeddings, fused_embeddings.