You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
79 lines
2.2 KiB
79 lines
2.2 KiB
---
|
|
title: DuckDB Access Pattern
|
|
category: patterns
|
|
---
|
|
# DuckDB Access Pattern
|
|
|
|
## Rules
|
|
|
|
- Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
|
|
- Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
|
|
- If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
|
|
- Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
|
|
|
|
## Examples
|
|
|
|
### database.py - Explicit connect/close for schema init
|
|
|
|
```python
|
|
conn = duckdb.connect(self.db_path)
|
|
...
|
|
conn.execute("""
|
|
CREATE TABLE IF NOT EXISTS fused_embeddings (
|
|
id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
|
|
motion_id INTEGER NOT NULL,
|
|
window_id TEXT NOT NULL,
|
|
vector JSON NOT NULL,
|
|
svd_dims INTEGER NOT NULL,
|
|
text_dims INTEGER NOT NULL,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
PRIMARY KEY (id)
|
|
)
|
|
""")
|
|
conn.close()
|
|
```
|
|
|
|
### pipeline/svd_pipeline.py - Read-only connection
|
|
|
|
```python
|
|
conn = duckdb.connect(db_path, read_only=True)
|
|
try:
|
|
rows = conn.execute(
|
|
"SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
|
|
(start_date, end_date),
|
|
).fetchall()
|
|
finally:
|
|
conn.close()
|
|
```
|
|
|
|
### similarity/compute.py - Preferred 'with' context
|
|
|
|
```python
|
|
try:
|
|
import duckdb
|
|
except Exception:
|
|
logger.exception("duckdb import failed; cannot load vectors")
|
|
return 0
|
|
|
|
with duckdb.connect(db.db_path) as conn:
|
|
rows = conn.execute(query, params).fetchall()
|
|
```
|
|
|
|
## Anti-Patterns
|
|
|
|
### Bad: Connection without closure
|
|
|
|
```python
|
|
# BAD: connection may leak if exception occurs before explicit close
|
|
conn = duckdb.connect(db_path)
|
|
rows = conn.execute("SELECT ...").fetchall()
|
|
# missing finally/close
|
|
```
|
|
|
|
**Remediation**: Use "with" context or ensure conn.close() in finally block.
|
|
|
|
### Bad: Parallel write connections
|
|
|
|
**Problem**: Opening write connections from many parallel workers without coordination.
|
|
|
|
**Remediation**: Open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.
|
|
|