You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
70 lines
2.6 KiB
70 lines
2.6 KiB
name: duckdb_access
|
|
|
|
rules:
|
|
- Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
|
|
- Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
|
|
- If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
|
|
- Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).
|
|
|
|
examples:
|
|
- path: database.py
|
|
excerpt: |
|
|
```python
|
|
conn = duckdb.connect(self.db_path)
|
|
...
|
|
conn.execute("""
|
|
CREATE TABLE IF NOT EXISTS fused_embeddings (
|
|
id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
|
|
motion_id INTEGER NOT NULL,
|
|
window_id TEXT NOT NULL,
|
|
vector JSON NOT NULL,
|
|
svd_dims INTEGER NOT NULL,
|
|
text_dims INTEGER NOT NULL,
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
|
PRIMARY KEY (id)
|
|
)
|
|
""")
|
|
conn.close()
|
|
```
|
|
note: explicit connect/close used when initializing schema
|
|
|
|
- path: pipeline/svd_pipeline.py
|
|
excerpt: |
|
|
```python
|
|
conn = duckdb.connect(db_path, read_only=True)
|
|
try:
|
|
rows = conn.execute(
|
|
"SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
|
|
(start_date, end_date),
|
|
).fetchall()
|
|
finally:
|
|
conn.close()
|
|
```
|
|
note: read_only connection used for compute-heavy worker
|
|
|
|
- path: similarity/compute.py
|
|
excerpt: |
|
|
```python
|
|
try:
|
|
import duckdb
|
|
except Exception:
|
|
logger.exception("duckdb import failed; cannot load vectors")
|
|
return 0
|
|
|
|
with duckdb.connect(db.db_path) as conn:
|
|
rows = conn.execute(query, params).fetchall()
|
|
```
|
|
note: preferred 'with' context for automatic close
|
|
|
|
anti_patterns:
|
|
- Bad: creating a connection without closure in a long-running process
|
|
remediation: use "with" context or ensure conn.close() in finally block
|
|
example: |
|
|
```python
|
|
# BAD: connection may leak if exception occurs before explicit close
|
|
conn = duckdb.connect(db_path)
|
|
rows = conn.execute("SELECT ...").fetchall()
|
|
# missing finally/close
|
|
```
|
|
- Bad: Opening write connections from many parallel workers without coordination
|
|
remediation: open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.
|
|
|