7.0 KiB
| date | topic | focus |
|---|---|---|
| 2026-04-04 | reliability-correctness-improvements | reliability and correctness |
Ideation: Reliability & Correctness Improvements
Codebase Context
- Python + Streamlit + DuckDB data pipeline application
- Key Issues from docs/solutions/:
- SVD labels must reflect voting patterns, not semantic content (850+ SVD component labels in code)
- Bare exception handlers: 850+
except Exception:across codebase - Nested exception handling creates opaque error paths
- Error handling catches broad Exception and prints to stdout (179
print()statements in error paths)
- Existing Pattern:
explorer_helpers.pyis pure functions, testable, well-structured — the model to follow
Grounding Evidence
docs/solutions/best-practices/svd-labels-voting-patterns-not-semantics.mddocuments the SVD labeling convention- Grep search found 281
except Exception:in.pyfiles plus bareexcept:handlers database.pyline 47: bareexcept:that catches everything including KeyboardInterrupt- 179 print statements in error handling paths hide issues from logging
Ranked Ideas
1. Right-Wing Party Axis Validation — Automated Assert
Description: Add runtime validation that PVV, FVD, JA21, SGP appear on RIGHT side of all SVD/PCA axes. Create a validate_axis_polrity() function that checks party loadings and raises AssertionError if right-wing parties appear on the left.
Rationale: This is the most impactful correctness fix — the project convention is explicitly documented in AGENTS.md yet has no automated enforcement. A single validation pass catches SVD labeling errors before they reach production.
Downsides: Requires careful handling of axis flips (sometimes flipping is the correct fix, not validation failure).
Confidence: 95%
Complexity: Low
Status: Unexplored
2. Type-Safe Vote Normalization with Exhaustiveness Checking
Description: Replace the fragile string-based vote normalization in database.py (lines 715-744) with a typed enum + exhaustiveness checking. Add a Vote enum with variants: VOOR, TEGEN, ONTHOUDEN, AFWEZIG. Use match/case with case _ to catch unmapped values at development time.
Rationale: The current normalization silently returns None for unknown vote values — this causes data loss that only manifests as "agreement percentage is wrong". Typed enums with exhaustiveness checking prevent silent data loss.
Downsides: Requires updating all call sites that pass vote strings.
Confidence: 90%
Complexity: Medium
Status: Unexplored
3. DuckDB Connection Leak Detector — Context Manager Audit
Description: Audit all duckdb.connect() calls for proper context manager usage or explicit .close(). Many handlers catch exceptions but forget to close connections. Add a ConnectionTracker that warns on unclosed connections in development.
Rationale: Connection leaks accumulate and eventually exhaust database connections. The codebase has 15+ places where exceptions cause early returns without connection cleanup.
Downsides: Tracking adds overhead; some leaks are already handled by DuckDB's connection pooling.
Confidence: 85%
Complexity: Medium
Status: Unexplored
4. Replace Print-Based Debugging with Structured Logging
Description: Replace the 179 print() statements in error paths with structured logging using the existing _logger. Create a script that automates this conversion for common patterns.
Rationale: Print statements go to stdout and are discarded in production. Proper logging enables log aggregation, alerting, and debugging of production issues.
Downsides: High volume of changes; risk of losing context in some print statements.
Confidence: 80%
Complexity: Medium
Status: Unexplored
5. SVD Component Label Verification — Pre-Deployment Assertion
Description: Create a CI/CD pre-deployment script that verifies SVD labels against actual voting data — checking that labels match the voting pattern, not semantic assumptions. Query which parties vote positive/negative per component and validate label accuracy.
Rationale: The SVD label documentation exists but there's no enforcement. This automated check prevents the documented mistake (semantic labels that don't match voting) from recurring.
Downsides: Requires understanding of the SVD pipeline and periodic re-calibration as voting data changes.
Confidence: 75%
Complexity: Medium
Status: Unexplored
6. Nested Exception Handler Flattening — EAFP to LBYL Migration
Description: Replace nested try-except blocks with explicit preconditions (LBYL — Look Before You Leap). Many handlers wrap every operation in try-except because they don't trust the data. Add validation functions that check preconditions before operations.
Rationale: Nested exception handlers make the control flow impossible to reason about. Replacing with explicit validation makes code more readable and debuggable.
Downsides: Requires understanding what conditions each operation actually needs.
Confidence: 70%
Complexity: High
Status: Unexplored
7. Database Schema Validation — Foreign Key and Constraint Checks
Description: Add startup validation that checks the actual database schema against expected schema. Verify table existence, column types, and foreign key relationships. Fail fast with clear error messages if schema is stale.
Rationale: The current code tries to add columns with ALTER TABLE ... IF NOT EXISTS which can fail silently. A schema validation pass catches migration failures immediately.
Downsides: Schema changes require updating validation code.
Confidence: 85%
Complexity: Low
Status: Unexplored
8. Motion Data Sanitization Pipeline — Pre-Insert Validation
Description: Add a sanitization layer for incoming motion data that validates:
winning_marginis between 0 and 1policy_areais non-emptyvoting_resultskeys match known parties- Date parsing succeeds for motion dates
Rationale: The current insertion code trusts upstream data. Invalid data causes hard-to-debug issues downstream in SVD computation and similarity calculations.
Downsides: Requires defining what "valid" means for each field.
Confidence: 80%
Complexity: Medium
Status: Unexplored
Rejection Summary
| # | Idea | Reason Rejected |
|---|---|---|
| 1 | Add unit tests for exception paths | Good idea but lower leverage than preventing errors at source; covered by existing test infrastructure |
| 2 | Refactor all 850+ exception handlers in one pass | Too high volume — needs phased approach captured by idea #1 |
| 3 | Add type hints to all functions | Good hygiene but doesn't directly address reliability — covered by existing typing effort |
| 4 | Implement circuit breaker for external API calls | No external API calls observed in core codebase |
Session Log
- 2026-04-04: Initial ideation — 8 generated, 8 survived