name: duckdb_access

rules:
  - Prefer using read_only=True for compute-only subprocesses (e.g., SVD compute) to allow concurrent readers.
  - Prefer "with duckdb.connect(db_path, read_only=True) as conn" for scoped connections so conn.close() is automatic.
  - If a long-lived connection is created at module level, provide explicit close() or ensure operation is safe for Streamlit's lifecycle.
  - Prefer parameterizing db_path in pipelines and creating connections locally (avoid global connections that cross threads).

examples:
  - path: database.py
    excerpt: |
      ```python
      conn = duckdb.connect(self.db_path)
      ...
      conn.execute("""
          CREATE TABLE IF NOT EXISTS fused_embeddings (
              id INTEGER DEFAULT nextval('fused_embeddings_id_seq'),
              motion_id INTEGER NOT NULL,
              window_id TEXT NOT NULL,
              vector JSON NOT NULL,
              svd_dims INTEGER NOT NULL,
              text_dims INTEGER NOT NULL,
              created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
              PRIMARY KEY (id)
          )
      """)
      conn.close()
      ```
    note: explicit connect/close used when initializing schema

  - path: pipeline/svd_pipeline.py
    excerpt: |
      ```python
      conn = duckdb.connect(db_path, read_only=True)
      try:
          rows = conn.execute(
              "SELECT motion_id, mp_name, vote FROM mp_votes WHERE date BETWEEN ? AND ?",
              (start_date, end_date),
          ).fetchall()
      finally:
          conn.close()
      ```
    note: read_only connection used for compute-heavy worker

  - path: similarity/compute.py
    excerpt: |
      ```python
      try:
          import duckdb
      except Exception:
          logger.exception("duckdb import failed; cannot load vectors")
          return 0

      with duckdb.connect(db.db_path) as conn:
          rows = conn.execute(query, params).fetchall()
      ```
    note: preferred 'with' context for automatic close

anti_patterns:
  - Bad: creating a connection without closure in a long-running process
    remediation: use "with" context or ensure conn.close() in finally block
    example: |
      ```python
      # BAD: connection may leak if exception occurs before explicit close
      conn = duckdb.connect(db_path)
      rows = conn.execute("SELECT ...").fetchall()
      # missing finally/close
      ```
  - Bad: Opening write connections from many parallel workers without coordination
    remediation: open read_only for compute processes and centralize writes via short-lived connections or a single writer worker.