You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2.2 KiB
2.2 KiB
| title | category |
|---|---|
| DuckDB Access Pattern | patterns |
DuckDB Access Pattern
Rules
- Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
- Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
- If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
- Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
Examples
database.py - Explicit connect/close for schema init
conn = duckdb.connect(self.db_path)
...
conn.execute("""
CREATE TABLE IF NOT EXISTS fused_embeddings (
id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
motion_id INTEGER NOT NULL,
window_id TEXT NOT NULL,
vector JSON NOT NULL,
svd_dims INTEGER NOT NULL,
text_dims INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id)
)
""")
conn.close()
pipeline/svd_pipeline.py - Read-only connection
conn = duckdb.connect(db_path, read_only=True)
try:
rows = conn.execute(
"SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
(start_date, end_date),
).fetchall()
finally:
conn.close()
similarity/compute.py - Preferred 'with' context
try:
import duckdb
except Exception:
logger.exception("duckdb import failed; cannot load vectors")
return 0
with duckdb.connect(db.db_path) as conn:
rows = conn.execute(query, params).fetchall()
Anti-Patterns
Bad: Connection without closure
# BAD: connection may leak if exception occurs before explicit close
conn = duckdb.connect(db_path)
rows = conn.execute("SELECT ...").fetchall()
# missing finally/close
Remediation: Use "with" context or ensure conn.close() in finally block.
Bad: Parallel write connections
Problem: Opening write connections from many parallel workers without coordination.
Remediation: Open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.