7.6 KiB
| date | topic | focus |
|---|---|---|
| 2026-04-04 | stemwijzer-improvement-ideas | general |
Ideation: Stemwijzer Improvement Ideas
Codebase Context
Project shape: Python/Streamlit Dutch voting advice tool ("Stemwijzer")
- Uses uv for package management, pytest for testing, DuckDB for data
- Key modules: analysis/, pipeline/, database.py (50KB), explorer.py (143KB)
- Notable: 3 venvs (.venv, .venv_axis, .venv_plotly) suggest dependency experimentation
- AGENTS.md exists with conventions (right-wing parties on RIGHT side, SVD labels reflect voting patterns)
Pain points identified:
- explorer.py is 143KB monolith - hard to navigate
- SVD labels must reflect voting patterns (documented as learning)
- 850+ bare exception handlers documented as anti-pattern
- No CONTRIBUTING.md for onboarding
Leverage points:
- Good test organization (tests/ with subdirs)
- Documented solutions in docs/solutions/
- explorer_helpers.py proves pure-function pattern works
Ranked Ideas
1. Right-Wing Party Axis Validation
Description: Add an automated test that asserts PVV, FVD, JA21, SGP appear on the RIGHT side (positive loading) of all SVD/PCA axes.
Rationale: This is the #1 project convention (from AGENTS.md) with zero automated enforcement. The documented SVD label bug showed how easy it is to get this wrong. A simple test prevents regression.
Downsides: Requires defining "RIGHT side" for each component - some components may have flipped poles.
Confidence: 95% Complexity: Low Status: Unexplored
2. Extract Business Logic from explorer.py
Description: Break the 143KB explorer.py monolith into pure functions in a new module (e.g., analysis/explorer_core.py), keeping only UI glue in the main file.
Rationale: explorer.py is too large to navigate, review, or refactor safely. The explorer_helpers.py pattern already proves pure functions work. This enables parallel development and safer changes.
Downsides: High complexity - requires understanding all the current dependencies and careful extraction to avoid breaking the Streamlit UI.
Confidence: 90% Complexity: High Status: Unexplored
3. SVD Component Label Verification
Description: Create a pre-deployment verification script that checks SVD_THEMES labels against actual voting data, flagging components where labels don't match party score distributions.
Rationale: The documented SVD label bug showed labels can drift from reality. A verification step before deployment prevents this recurring.
Downsides: Requires clear criteria for "label matches voting data" - some components are genuinely ambiguous.
Confidence: 85% Complexity: Medium Status: Unexplored
4. Interactive Component-Explorer UI
Description: Add a Streamlit UI selector letting users view any pair of SVD components as a 2D scatter plot, not just the political compass (components 1-2).
Rationale: Components 3-10 are essentially black boxes. Making these explorable reveals hidden political dimensions and adds significant user value.
Downsides: Requires understanding how to project between arbitrary component pairs.
Confidence: 85% Complexity: Medium Status: Unexplored
5. Type-Safe Vote Normalization
Description: Replace string-based vote normalization (casting '1', '-1', '0' strings) with typed enums and exhaustiveness checking.
Rationale: Vote matching is core functionality - wrong types cause silent bugs. Typed enums catch errors at compile time.
Downsides: Requires updating all callers and ensuring backward compatibility.
Confidence: 80% Complexity: Medium Status: Unexplored
6. Add CONTRIBUTING.md
Description: Create top-level CONTRIBUTING.md covering setup (uv), running tests, lint/typecheck commands, and key conventions from AGENTS.md.
Rationale: AGENTS.md is internal-focused. A CONTRIBUTING.md lowers the barrier for external contributors and encodes project norms explicitly.
Downsides: Low risk - straightforward documentation.
Confidence: 75% Complexity: Low Status: Explored
7. Database Schema Validation
Description: Add startup validation that checks the actual database schema against expected schema. Verify table existence, column types, and foreign key relationships. Fail fast with clear error messages if schema is stale.
Rationale: The current code tries to add columns with ALTER TABLE ... IF NOT EXISTS which can fail silently. A schema validation pass catches migration failures immediately.
Downsides: Schema changes require updating validation code.
Confidence: 85%
Complexity: Low
Status: Unexplored
8. DuckDB Connection Leak Detector
Description: Audit all duckdb.connect() calls for proper context manager usage or explicit .close(). Many handlers catch exceptions but forget to close connections. Add a ConnectionTracker that warns on unclosed connections in development.
Rationale: Connection leaks accumulate and eventually exhaust database connections. The codebase has 15+ places where exceptions cause early returns without connection cleanup.
Downsides: Tracking adds overhead; some leaks are already handled by DuckDB's connection pooling.
Confidence: 85%
Complexity: Medium
Status: Unexplored
9. Static Analysis Rule for Bare Except
Description: Add a flake8 plugin or ruff rule that flags except: and except Exception: without re-raising or logging. Document the project-specific exception hierarchy.
Rationale: Prevents the anti-pattern from re-entering. The project has 208 violations — a custom lint rule catches new violations and encodes the team's error-handling philosophy.
Downsides: Requires defining what specific exceptions ARE allowed per context.
Confidence: 70%
Complexity: Low
Status: Unexplored
10. SVD Component Label Verification
Description: Create a CI/CD pre-deployment script that verifies SVD labels against actual voting data — checking that labels match the voting pattern, not semantic assumptions. Rationale: The SVD label documentation exists but there's no enforcement. This automated check prevents the documented mistake from recurring. Downsides: Requires understanding of the SVD pipeline and periodic re-calibration. Confidence: 75% Complexity: Medium Status: Unexplored
Rejection Summary (Raised Bar — 2026-04-05)
| # | Idea | Reason Rejected |
|---|---|---|
| 1 | Consolidate 3 venvs into 1 | Lower priority - works currently, would need investigation |
| 2 | Modularize database.py | Secondary to explorer.py refactor; not a direct user/developer impact |
| 3 | Add Makefile/Task Aliases | Nice-to-have, lower leverage |
| 4 | Exception Handler Audit (208 handlers) | Too large to scope safely; architectural, not fixing root cause |
| 5 | Add Comprehensive Type Hints | Huge scope; hygiene, not correctness |
| 6 | Party Polarization Score | Interesting but niche |
| 7 | Scree Plot Extension | Low urgency feature |
| 8 | Typed DTOs for Database Layer | High migration effort; duckdb interop complications |
| 9 | Nested Exception Handler Flattening | Architectural refactor; too much change for uncertain value |
| 10 | Print→Logging Replacement (179 print statements) | High effort, low leverage — logging exists but not used |
| 11 | Code Climate Metrics | Measures for its own sake; doesn't directly prevent bugs |
| 12 | CONTRIBUTING.md | Good hygiene, low urgency — can defer |
Session Log
- 2026-04-04: Initial ideation — 32 generated, 6 survived
- 2026-04-05: Raised the bar — 22 ideas reviewed, 5 survivors after stricter filtering
- Idea #1 (Right-Wing Party Axis Validation) selected for brainstorming