9.5 KiB
| date | topic | status |
|---|---|---|
| 2026-03-22 | StemAtlas — Public Deployment on sgeboers.nl | validated |
StemAtlas Deployment Design
Problem Statement
The stemwijzer project has three user-facing products ready to publish:
- A blog post explaining the political compass methodology and findings
- An interactive explorer (political compass, party trajectories, motion search)
- The stemwijzer quiz (vote on motions, see which parties match you)
These need to be deployed publicly on sgeboers.nl using the existing VPS + Gitea + Drone + Docker stack.
The Name: StemAtlas
stematlas.sgeboers.nl
Dutch wordplay: stem = vote AND voice (as in "the voice of parliament") + atlas = a comprehensive map of the world. Together: an atlas of voices — a map of how Dutch democracy sounds from the inside.
It's broader than "stemwijzer" (which implies a voting guide) — it positions the site as a data exploration and journalism tool.
Constraints
- Existing VPS running Nginx, Gitea, Drone
- Deployment pipeline: Docker build → push to registry → SSH
docker-compose up -d - sgeboers.nl is a raw HTML/CSS site (not Hugo) hosted as a repo on git.sgeboers.nl
- DuckDB file lives on the VPS — single writer (scheduler), multiple readers (Streamlit)
- No new cloud services or hosting costs
Architecture
Internet
│
├── sgeboers.nl (raw HTML/CSS site, existing repo on git.sgeboers.nl)
│ └── blog/stematlas.html ← blog post with inline charts + link to subdomain
│
└── stematlas.sgeboers.nl
└── Nginx (reverse proxy)
└── Streamlit multi-page app (port 8501)
├── Page 1: Stemwijzer Quiz (app.py)
└── Page 2: Explorer (explorer.py)
VPS filesystem:
/srv/stematlas/
├── data/motions.db ← DuckDB (shared, read-write by scheduler)
└── docker-compose.yml
Components
1. Streamlit Multi-Page App
Restructure entry point from app.py → Home.py with a pages/ directory:
Home.py ← landing page / about
pages/
1_Stemwijzer.py ← quiz (app.py content)
2_Explorer.py ← explorer.py content
Streamlit's built-in multi-page routing handles navigation. One Docker container, one port (8501).
Why not two separate containers?
Single shared DuckDB file on VPS filesystem. Both pages open read-only connections (quiz opens read-write for session data, but that's the existing behaviour). One container = one volume mount = no coordination overhead.
2. Docker Compose
The existing .drone.yml already calls docker-compose up -d on the VPS. We add/update docker-compose.yml:
Services:
stematlas:
image: registry/stematlas:latest
ports: 8501 (internal only)
volumes:
- /srv/stematlas/data:/app/data ← persistent DB
restart: unless-stopped
scheduler:
image: registry/stematlas:latest
command: python scheduler.py
volumes:
- /srv/stematlas/data:/app/data ← same DB, write access
restart: unless-stopped
Scheduler as a sidecar: runs in the same image but different container, keeps DB updated nightly. Streamlit container never writes to DB (except user sessions in the quiz).
3. Nginx Vhost
New server block on the VPS:
stematlas.sgeboers.nl → proxy_pass http://127.0.0.1:8501
Standard Streamlit proxy requirements: proxy_http_version 1.1, WebSocket upgrade headers for /_stcore/stream. Let's Encrypt cert via Certbot (standard pattern).
4. Drone CI Pipeline Update
Existing .drone.yml steps remain identical — build, push, SSH deploy. The only change: docker-compose.yml in the repo now references both the stematlas and scheduler services, so docker-compose up -d picks them both up.
No new Drone secrets needed if DOCKER_REGISTRY, DEPLOY_HOST etc. are already set.
5. Blog Post (Raw HTML page on sgeboers.nl)
The blog post is a new blog/stematlas.html file added to the sgeboers.nl repo on git.sgeboers.nl. The Drone pipeline for that repo deploys it like any other static file — push to git, Drone copies to webroot, Nginx serves it.
Chart embedding strategy — inline Plotly divs:
Rather than iframes, we extract just the chart <div> + <script> from generate_compass.py's output (using fig.to_html(include_plotlyjs='cdn', full_html=False)) and paste them directly into the blog post HTML. This is cleaner than iframes — no border, no scroll issues, full-width, loads with the page.
Plotly CDN script included once in the <head>. Each chart is just a <div id="chart-N"> + a <script> block below it.
Linking to the subdomain:
The blog post is the article — it tells the story with static charts. The subdomain is the playground. The post links to stematlas.sgeboers.nl at two natural moments:
- After the political compass chart: "Explore every window interactively →"
- At the end: "Take the quiz yourself →"
This is the right split: blog post brings readers in via search/sharing, subdomain gives them something to do.
Chart generation workflow:
scripts/generate_compass.py → outputs/
├── compass_2025.html ← main compass (latest window)
├── trajectories_2019_2025.html ← party drift over time
└── compass_2024-Q4.html ← quarterly detail
Run fig.to_html(include_plotlyjs='cdn', full_html=False) to extract embeddable snippets, paste into blog/stematlas.html in the sgeboers.nl repo.
Blog Post Charts — What to Include
The blog post narrates three acts. Each gets a supporting chart:
Act 1: The Method
No chart needed — the SVD explanation is conceptual. Use a simple HTML table for the vote matrix illustration.
Act 2: The Political Compass
Chart: compass_latest_annual.html
- 2D scatter of all parties for the most recent full annual window (2024 or 2025)
- Axes: PC1 (left-right) × PC2 (residual, typically progressive-traditionalist)
- Points coloured and labelled by party
- Interactive: hover shows party name + coordinates
- Caption: "Each party's position computed purely from voting patterns — no labels applied by us"
Chart: trajectories_all_parties.html
- Line chart of party positions across all annual windows (2016–2025)
- One line per party, coloured consistently
- Key narrative moments annotated: BBB arrival (2022), coalition formation (2022), Rutte → Schoof (2024)
- Interactive: toggle parties on/off via legend
Act 3: Motion Similarity
Chart: compass_motions_sample.html (optional, depends on data quality)
- 2D UMAP scatter of ~500 sampled motions, coloured by policy area
- Shows clustering: climate motions cluster together, budget motions cluster together, etc.
- If UMAP results aren't clean enough to tell a clear story, skip this one
Static table: Motion counts by year Just a markdown table in the blog post — no chart needed.
Data Flow
scheduler.py (nightly)
└── api_client → downloads new motions → DuckDB
On demand (manual or cron):
└── run_pipeline.py → SVD + embeddings + fusion + similarity cache → DuckDB
└── generate_compass.py → static HTML charts → sgeboers.nl repo (blog/stematlas.html)
Streamlit (reads only):
└── duckdb.connect(read_only=True) → all analysis queries
The DB is the source of truth. Charts are regenerated and re-copied to Hugo whenever the pipeline produces new data — probably monthly.
Error Handling Strategy
- Streamlit crash: Docker
restart: unless-stoppedbrings it back automatically - Scheduler crash: Same restart policy; DuckDB's WAL handles partial writes
- DB file corruption: Not handled beyond OS-level backup. Mitigate by adding a weekly
cp data/motions.db data/motions.db.bakto the scheduler or as a cron job on the VPS - Blog charts stale: Acceptable — charts are labelled with their window date; stale by 30 days is fine for a blog post
- Streamlit + scheduler write conflict: Scheduler is the only writer. Streamlit and quiz sessions both use separate connections; DuckDB handles concurrent reads fine. The quiz writes
user_sessionsrows — low frequency, no conflict risk with scheduler
Testing Strategy
- Import smoke test for
explorer.pyalready exists (tests/test_explorer_import.py) Home.pyandpages/restructure needs a corresponding smoke test- Drone build will catch import errors before deploy
- Manual verification:
docker-compose uplocally against a copy ofdata/motions.db, check all four Streamlit tabs render without error - Blog post charts: visual review after
generate_compass.pyrun — no automated test needed
Open Questions
- Multi-page restructure scope: Does the quiz (
app.py) need any changes beyond being wrapped in apages/file, or can it be imported as-is? Theif __name__ == "__main__"guard inapp.pyneeds reviewing. - Streamlit base path: Subdomain approach (
stematlas.sgeboers.nl) means no subpath complexity — Streamlit runs at/. Clean. - Chart update cadence: Manual (run
generate_compass.py, extract snippets, paste into blog post HTML, push to sgeboers.nl repo). Fine initially — charts are labelled with window date. - sgeboers.nl nav structure: No blog directory exists yet. Need to add
blog/dir, ablog/stematlas.htmlfile, and a nav link on the main site. Structure TBD after inspecting the existing HTML/CSS site. - Nginx already running: Need to confirm Certbot/Let's Encrypt workflow matches what's already set up on the VPS for other subdomains.