You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-03-22-stematlas-deploy...

229 lines
9.5 KiB

---
date: 2026-03-22
topic: "StemAtlas — Public Deployment on sgeboers.nl"
status: validated
---
# StemAtlas Deployment Design
## Problem Statement
The stemwijzer project has three user-facing products ready to publish:
1. **A blog post** explaining the political compass methodology and findings
2. **An interactive explorer** (political compass, party trajectories, motion search)
3. **The stemwijzer quiz** (vote on motions, see which parties match you)
These need to be deployed publicly on sgeboers.nl using the existing VPS + Gitea + Drone + Docker stack.
---
## The Name: StemAtlas
**`stematlas.sgeboers.nl`**
Dutch wordplay: **stem** = *vote* AND *voice* (as in "the voice of parliament") + **atlas** = a comprehensive map of the world. Together: *an atlas of voices* — a map of how Dutch democracy sounds from the inside.
It's broader than "stemwijzer" (which implies a voting guide) — it positions the site as a data exploration and journalism tool.
---
## Constraints
- Existing VPS running Nginx, Gitea, Drone
- Deployment pipeline: Docker build → push to registry → SSH `docker-compose up -d`
- sgeboers.nl is a **raw HTML/CSS site** (not Hugo) hosted as a repo on git.sgeboers.nl
- DuckDB file lives on the VPS — single writer (scheduler), multiple readers (Streamlit)
- No new cloud services or hosting costs
---
## Architecture
```
Internet
├── sgeboers.nl (raw HTML/CSS site, existing repo on git.sgeboers.nl)
│ └── blog/stematlas.html ← blog post with inline charts + link to subdomain
└── stematlas.sgeboers.nl
└── Nginx (reverse proxy)
└── Streamlit multi-page app (port 8501)
├── Page 1: Stemwijzer Quiz (app.py)
└── Page 2: Explorer (explorer.py)
VPS filesystem:
/srv/stematlas/
├── data/motions.db ← DuckDB (shared, read-write by scheduler)
└── docker-compose.yml
```
---
## Components
### 1. Streamlit Multi-Page App
Restructure entry point from `app.py``Home.py` with a `pages/` directory:
```
Home.py ← landing page / about
pages/
1_Stemwijzer.py ← quiz (app.py content)
2_Explorer.py ← explorer.py content
```
Streamlit's built-in multi-page routing handles navigation. One Docker container, one port (8501).
**Why not two separate containers?**
Single shared DuckDB file on VPS filesystem. Both pages open read-only connections (quiz opens read-write for session data, but that's the existing behaviour). One container = one volume mount = no coordination overhead.
### 2. Docker Compose
The existing `.drone.yml` already calls `docker-compose up -d` on the VPS. We add/update `docker-compose.yml`:
```
Services:
stematlas:
image: registry/stematlas:latest
ports: 8501 (internal only)
volumes:
- /srv/stematlas/data:/app/data ← persistent DB
restart: unless-stopped
scheduler:
image: registry/stematlas:latest
command: python scheduler.py
volumes:
- /srv/stematlas/data:/app/data ← same DB, write access
restart: unless-stopped
```
**Scheduler as a sidecar**: runs in the same image but different container, keeps DB updated nightly. Streamlit container never writes to DB (except user sessions in the quiz).
### 3. Nginx Vhost
New server block on the VPS:
```
stematlas.sgeboers.nl → proxy_pass http://127.0.0.1:8501
```
Standard Streamlit proxy requirements: `proxy_http_version 1.1`, WebSocket upgrade headers for `/_stcore/stream`. Let's Encrypt cert via Certbot (standard pattern).
### 4. Drone CI Pipeline Update
Existing `.drone.yml` steps remain identical — build, push, SSH deploy. The only change: `docker-compose.yml` in the repo now references both the `stematlas` and `scheduler` services, so `docker-compose up -d` picks them both up.
No new Drone secrets needed if `DOCKER_REGISTRY`, `DEPLOY_HOST` etc. are already set.
### 5. Blog Post (Raw HTML page on sgeboers.nl)
The blog post is a new `blog/stematlas.html` file added to the sgeboers.nl repo on git.sgeboers.nl. The Drone pipeline for that repo deploys it like any other static file — push to git, Drone copies to webroot, Nginx serves it.
**Chart embedding strategy — inline Plotly divs:**
Rather than iframes, we extract just the chart `<div>` + `<script>` from `generate_compass.py`'s output (using `fig.to_html(include_plotlyjs='cdn', full_html=False)`) and paste them directly into the blog post HTML. This is cleaner than iframes — no border, no scroll issues, full-width, loads with the page.
Plotly CDN script included once in the `<head>`. Each chart is just a `<div id="chart-N">` + a `<script>` block below it.
**Linking to the subdomain:**
The blog post is the *article* — it tells the story with static charts. The subdomain is the *playground*. The post links to `stematlas.sgeboers.nl` at two natural moments:
- After the political compass chart: *"Explore every window interactively →"*
- At the end: *"Take the quiz yourself →"*
This is the right split: blog post brings readers in via search/sharing, subdomain gives them something to do.
**Chart generation workflow:**
```
scripts/generate_compass.py → outputs/
├── compass_2025.html ← main compass (latest window)
├── trajectories_2019_2025.html ← party drift over time
└── compass_2024-Q4.html ← quarterly detail
```
Run `fig.to_html(include_plotlyjs='cdn', full_html=False)` to extract embeddable snippets, paste into `blog/stematlas.html` in the sgeboers.nl repo.
---
## Blog Post Charts — What to Include
The blog post narrates three acts. Each gets a supporting chart:
### Act 1: The Method
**No chart needed** — the SVD explanation is conceptual. Use a simple HTML table for the vote matrix illustration.
### Act 2: The Political Compass
**Chart: `compass_latest_annual.html`**
- 2D scatter of all parties for the most recent full annual window (2024 or 2025)
- Axes: PC1 (left-right) × PC2 (residual, typically progressive-traditionalist)
- Points coloured and labelled by party
- Interactive: hover shows party name + coordinates
- Caption: "Each party's position computed purely from voting patterns — no labels applied by us"
**Chart: `trajectories_all_parties.html`**
- Line chart of party positions across all annual windows (2016–2025)
- One line per party, coloured consistently
- Key narrative moments annotated: BBB arrival (2022), coalition formation (2022), Rutte → Schoof (2024)
- Interactive: toggle parties on/off via legend
### Act 3: Motion Similarity
**Chart: `compass_motions_sample.html`** (optional, depends on data quality)
- 2D UMAP scatter of ~500 sampled motions, coloured by policy area
- Shows clustering: climate motions cluster together, budget motions cluster together, etc.
- If UMAP results aren't clean enough to tell a clear story, skip this one
**Static table: Motion counts by year**
Just a markdown table in the blog post — no chart needed.
---
## Data Flow
```
scheduler.py (nightly)
└── api_client → downloads new motions → DuckDB
On demand (manual or cron):
└── run_pipeline.py → SVD + embeddings + fusion + similarity cache → DuckDB
└── generate_compass.py → static HTML charts → sgeboers.nl repo (blog/stematlas.html)
Streamlit (reads only):
└── duckdb.connect(read_only=True) → all analysis queries
```
The DB is the source of truth. Charts are regenerated and re-copied to Hugo whenever the pipeline produces new data — probably monthly.
---
## Error Handling Strategy
- **Streamlit crash**: Docker `restart: unless-stopped` brings it back automatically
- **Scheduler crash**: Same restart policy; DuckDB's WAL handles partial writes
- **DB file corruption**: Not handled beyond OS-level backup. Mitigate by adding a weekly `cp data/motions.db data/motions.db.bak` to the scheduler or as a cron job on the VPS
- **Blog charts stale**: Acceptable — charts are labelled with their window date; stale by 30 days is fine for a blog post
- **Streamlit + scheduler write conflict**: Scheduler is the only writer. Streamlit and quiz sessions both use separate connections; DuckDB handles concurrent reads fine. The quiz writes `user_sessions` rows — low frequency, no conflict risk with scheduler
---
## Testing Strategy
- Import smoke test for `explorer.py` already exists (`tests/test_explorer_import.py`)
- `Home.py` and `pages/` restructure needs a corresponding smoke test
- Drone build will catch import errors before deploy
- Manual verification: `docker-compose up` locally against a copy of `data/motions.db`, check all four Streamlit tabs render without error
- Blog post charts: visual review after `generate_compass.py` run — no automated test needed
---
## Open Questions
1. **Multi-page restructure scope**: Does the quiz (`app.py`) need any changes beyond being wrapped in a `pages/` file, or can it be imported as-is? The `if __name__ == "__main__"` guard in `app.py` needs reviewing.
2. **Streamlit base path**: Subdomain approach (`stematlas.sgeboers.nl`) means no subpath complexity — Streamlit runs at `/`. Clean.
3. **Chart update cadence**: Manual (run `generate_compass.py`, extract snippets, paste into blog post HTML, push to sgeboers.nl repo). Fine initially — charts are labelled with window date.
4. **sgeboers.nl nav structure**: No blog directory exists yet. Need to add `blog/` dir, a `blog/stematlas.html` file, and a nav link on the main site. Structure TBD after inspecting the existing HTML/CSS site.
5. **Nginx already running**: Need to confirm Certbot/Let's Encrypt workflow matches what's already set up on the VPS for other subdomains.