name: duckdb_access rules: - Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers. - Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic. - If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle. - Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads). examples: - path: database.py excerpt: | ```python conn = duckdb.connect(self.db_path) ... conn.execute(""" CREATE TABLE IF NOT EXISTS fused_embeddings ( id INTEGER DEFAULT nextval('fused_embeddings_id_seq'), motion_id INTEGER NOT NULL, window_id TEXT NOT NULL, vector JSON NOT NULL, svd_dims INTEGER NOT NULL, text_dims INTEGER NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id) ) """) conn.close() ``` note: explicit connect/close used when initializing schema - path: pipeline/svd_pipeline.py excerpt: | ```python conn = duckdb.connect(db_path, read_only=True) try: rows = conn.execute( "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?", (start_date, end_date), ).fetchall() finally: conn.close() ``` note: read_only connection used for compute-heavy worker - path: similarity/compute.py excerpt: | ```python try: import duckdb except Exception: logger.exception("duckdb import failed; cannot load vectors") return 0 with duckdb.connect(db.db_path) as conn: rows = conn.execute(query, params).fetchall() ``` note: preferred 'with' context for automatic close anti_patterns: - Bad: creating a connection without closure in a long-running process remediation: use "with" context or ensure conn.close() in finally block example: | ```python # BAD: connection may leak if exception occurs before explicit close conn = duckdb.connect(db_path) rows = conn.execute("SELECT ...").fetchall() # missing finally/close ``` - Bad: Opening write connections from many parallel workers without coordination remediation: open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.