CODE STYLE ========== Purpose ------- This document records the conventions already in use in the codebase so new contributors and AI agents can produce code that fits the repository's existing style. General ------- - Language: Python (3.x) - Project uses one file-per-module with descriptive snake_case filenames (e.g., api_client.py, database.py) - Top-level module singletons are exposed when a single shared instance is desired (e.g. `db = MotionDatabase()`) - Keep code synchronous unless you introduce async consistently across modules (none currently use async/await) Naming ------ - Files / modules: snake_case.py (e.g., motion_scraper -> scraper.py, api_client.py) - Classes: PascalCase (e.g., MotionDatabase, MotionSummarizer, TweedeKamerAPI) - Functions and methods: snake_case (including private helpers with a single leading underscore) - Constants / config fields: UPPER_SNAKE_CASE (placed in config.py and referenced via `from config import config`) File organization ----------------- - Keep top-level domain modules in the repository root (this repo uses a flat layout) - Each module should contain one primary responsibility (e.g., database.py for DB logic) - Module-level singletons: create at module bottom and import from other modules (pattern used widely) Imports ------- - Group imports in this order with a blank line between groups: 1. Standard library (datetime, json, typing) 2. Third-party libraries (requests, duckdb, ibis, streamlit) 3. Local imports (from config import config, from database import db) - Use absolute imports (module name) rather than relative imports Typing ------ - Add type hints to public function signatures where helpful (project uses typing in several places). - Use typing.Dict, typing.List, typing.Optional for simple container annotations. Error handling & logging ------------------------ - Current pattern: functions catch broad Exception and print error messages, then return a safe default (False, [], None). Examples in database.py and api_client.py. - When updating code, prefer to: - Keep the existing behavior (return safe fallback) to avoid breaking call sites - Consider adding structured logging (use logging module) rather than print, but maintain similar high-level error flows unless refactoring intentionally. LLM / external API calls ------------------------ - OpenAI-compatible client usage is in summarizer.py. Environment variables are read from config.py. - Do NOT commit API keys or secrets. Use environment variables (OPENROUTER_API_KEY, etc.) and reference them by name. - Network calls are synchronous using requests. Keep request timeouts and error handling consistent with existing patterns (catch requests.exceptions.RequestException and return safe fallback values). Database patterns ----------------- - Database is DuckDB stored at data/motions.db. The MotionDatabase class opens short-lived duckdb connections inside methods (conn = duckdb.connect(self.db_path)). This pattern is used widely. - Queries and schema initialization happen inside MotionDatabase._init_database(). Keep DDL grouped there. - When writing methods that modify DB, follow the try/except + conn.close() pattern to guarantee cleanup. Testing ------- - Currently the project uses ad-hoc test scripts (test.py). If adding tests, follow pytest conventions: - Place tests in tests/ directory - Use filenames test_*.py and functions test_* with assertions - Mock external APIs (requests, LLM client) via monkeypatch or unittest.mock Patterns observed (use these when adding new code) ----------------------------------------------- - Singletons: expose module-level instance (e.g. `db = MotionDatabase()`), import it elsewhere - Private helpers: name with a single leading underscore (e.g., _get_voting_records) - Config: centralize in config.py and reference via `from config import config` (don't hardcode paths) Do's and Don'ts --------------- Do: - Follow existing naming: snake_case for files/functions - Add simple type hints for clarity - Return the same safe fallback values used in existing functions on error - Use module-level singletons for shared services if helpful Don't: - Don't add async/await in a single module without broader coordination - Don't print secret values or commit .env files - Don't create circular imports (be careful when modules instantiate singletons at import time) Example snippets ---------------- Conformant class and method: class ExampleService: def __init__(self, param: str = config.DATABASE_PATH): self.param = param def do_work(self, items: typing.List[dict]) -> bool: try: # short-lived DB/HTTP usage conn = duckdb.connect(config.DATABASE_PATH) # ... perform work conn.close() return True except Exception as e: print(f"Error in do_work: {e}") if 'conn' in locals(): conn.close() return False Adding a new module ------------------- 1. Create snake_case file (e.g., new_service.py) 2. Add a PascalCase class implementing the behavior and small helper functions prefixed with _ 3. If you need a shared instance, create `service = NewService()` at the module bottom 4. Import via `from new_service import service` in other modules