--- title: DuckDB Access Pattern category: patterns --- # DuckDB Access Pattern ## Rules - Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers. - Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic. - If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle. - Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads). ## Examples ### database.py - Explicit connect/close for schema init ```python conn = duckdb.connect(self.db_path) ... conn.execute(""" CREATE TABLE IF NOT EXISTS fused_embeddings ( id INTEGER DEFAULT nextval('fused_embeddings_id_seq'), motion_id INTEGER NOT NULL, window_id TEXT NOT NULL, vector JSON NOT NULL, svd_dims INTEGER NOT NULL, text_dims INTEGER NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) """) conn.close() ``` ### pipeline/svd_pipeline.py - Read-only connection ```python conn = duckdb.connect(db_path, read_only=True) try: rows = conn.execute( "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?", (start_date, end_date), ).fetchall() finally: conn.close() ``` ### similarity/compute.py - Preferred 'with' context ```python try: import duckdb except Exception: logger.exception("duckdb import failed; cannot load vectors") return 0 with duckdb.connect(db.db_path) as conn: rows = conn.execute(query, params).fetchall() ``` ## Anti-Patterns ### Bad: Connection without closure ```python # BAD: connection may leak if exception occurs before explicit close conn = duckdb.connect(db_path) rows = conn.execute("SELECT ...").fetchall() # missing finally/close ``` **Remediation**: Use "with" context or ensure conn.close() in finally block. ### Bad: Parallel write connections **Problem**: Opening write connections from many parallel workers without coordination. **Remediation**: Open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.