You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
229 lines
9.5 KiB
229 lines
9.5 KiB
---
|
|
date: 2026-03-22
|
|
topic: "StemAtlas — Public Deployment on sgeboers.nl"
|
|
status: validated
|
|
---
|
|
|
|
# StemAtlas Deployment Design
|
|
|
|
## Problem Statement
|
|
|
|
The stemwijzer project has three user-facing products ready to publish:
|
|
1. **A blog post** explaining the political compass methodology and findings
|
|
2. **An interactive explorer** (political compass, party trajectories, motion search)
|
|
3. **The stemwijzer quiz** (vote on motions, see which parties match you)
|
|
|
|
These need to be deployed publicly on sgeboers.nl using the existing VPS + Gitea + Drone + Docker stack.
|
|
|
|
---
|
|
|
|
## The Name: StemAtlas
|
|
|
|
**`stematlas.sgeboers.nl`**
|
|
|
|
Dutch wordplay: **stem** = *vote* AND *voice* (as in "the voice of parliament") + **atlas** = a comprehensive map of the world. Together: *an atlas of voices* — a map of how Dutch democracy sounds from the inside.
|
|
|
|
It's broader than "stemwijzer" (which implies a voting guide) — it positions the site as a data exploration and journalism tool.
|
|
|
|
---
|
|
|
|
## Constraints
|
|
|
|
- Existing VPS running Nginx, Gitea, Drone
|
|
- Deployment pipeline: Docker build → push to registry → SSH `docker-compose up -d`
|
|
- sgeboers.nl is a **raw HTML/CSS site** (not Hugo) hosted as a repo on git.sgeboers.nl
|
|
- DuckDB file lives on the VPS — single writer (scheduler), multiple readers (Streamlit)
|
|
- No new cloud services or hosting costs
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet
|
|
│
|
|
├── sgeboers.nl (raw HTML/CSS site, existing repo on git.sgeboers.nl)
|
|
│ └── blog/stematlas.html ← blog post with inline charts + link to subdomain
|
|
│
|
|
└── stematlas.sgeboers.nl
|
|
└── Nginx (reverse proxy)
|
|
└── Streamlit multi-page app (port 8501)
|
|
├── Page 1: Stemwijzer Quiz (app.py)
|
|
└── Page 2: Explorer (explorer.py)
|
|
|
|
VPS filesystem:
|
|
/srv/stematlas/
|
|
├── data/motions.db ← DuckDB (shared, read-write by scheduler)
|
|
└── docker-compose.yml
|
|
```
|
|
|
|
---
|
|
|
|
## Components
|
|
|
|
### 1. Streamlit Multi-Page App
|
|
|
|
Restructure entry point from `app.py` → `Home.py` with a `pages/` directory:
|
|
|
|
```
|
|
Home.py ← landing page / about
|
|
pages/
|
|
1_Stemwijzer.py ← quiz (app.py content)
|
|
2_Explorer.py ← explorer.py content
|
|
```
|
|
|
|
Streamlit's built-in multi-page routing handles navigation. One Docker container, one port (8501).
|
|
|
|
**Why not two separate containers?**
|
|
Single shared DuckDB file on VPS filesystem. Both pages open read-only connections (quiz opens read-write for session data, but that's the existing behaviour). One container = one volume mount = no coordination overhead.
|
|
|
|
### 2. Docker Compose
|
|
|
|
The existing `.drone.yml` already calls `docker-compose up -d` on the VPS. We add/update `docker-compose.yml`:
|
|
|
|
```
|
|
Services:
|
|
stematlas:
|
|
image: registry/stematlas:latest
|
|
ports: 8501 (internal only)
|
|
volumes:
|
|
- /srv/stematlas/data:/app/data ← persistent DB
|
|
restart: unless-stopped
|
|
|
|
scheduler:
|
|
image: registry/stematlas:latest
|
|
command: python scheduler.py
|
|
volumes:
|
|
- /srv/stematlas/data:/app/data ← same DB, write access
|
|
restart: unless-stopped
|
|
```
|
|
|
|
**Scheduler as a sidecar**: runs in the same image but different container, keeps DB updated nightly. Streamlit container never writes to DB (except user sessions in the quiz).
|
|
|
|
### 3. Nginx Vhost
|
|
|
|
New server block on the VPS:
|
|
|
|
```
|
|
stematlas.sgeboers.nl → proxy_pass http://127.0.0.1:8501
|
|
```
|
|
|
|
Standard Streamlit proxy requirements: `proxy_http_version 1.1`, WebSocket upgrade headers for `/_stcore/stream`. Let's Encrypt cert via Certbot (standard pattern).
|
|
|
|
### 4. Drone CI Pipeline Update
|
|
|
|
Existing `.drone.yml` steps remain identical — build, push, SSH deploy. The only change: `docker-compose.yml` in the repo now references both the `stematlas` and `scheduler` services, so `docker-compose up -d` picks them both up.
|
|
|
|
No new Drone secrets needed if `DOCKER_REGISTRY`, `DEPLOY_HOST` etc. are already set.
|
|
|
|
### 5. Blog Post (Raw HTML page on sgeboers.nl)
|
|
|
|
The blog post is a new `blog/stematlas.html` file added to the sgeboers.nl repo on git.sgeboers.nl. The Drone pipeline for that repo deploys it like any other static file — push to git, Drone copies to webroot, Nginx serves it.
|
|
|
|
**Chart embedding strategy — inline Plotly divs:**
|
|
|
|
Rather than iframes, we extract just the chart `<div>` + `<script>` from `generate_compass.py`'s output (using `fig.to_html(include_plotlyjs='cdn', full_html=False)`) and paste them directly into the blog post HTML. This is cleaner than iframes — no border, no scroll issues, full-width, loads with the page.
|
|
|
|
Plotly CDN script included once in the `<head>`. Each chart is just a `<div id="chart-N">` + a `<script>` block below it.
|
|
|
|
**Linking to the subdomain:**
|
|
|
|
The blog post is the *article* — it tells the story with static charts. The subdomain is the *playground*. The post links to `stematlas.sgeboers.nl` at two natural moments:
|
|
- After the political compass chart: *"Explore every window interactively →"*
|
|
- At the end: *"Take the quiz yourself →"*
|
|
|
|
This is the right split: blog post brings readers in via search/sharing, subdomain gives them something to do.
|
|
|
|
**Chart generation workflow:**
|
|
|
|
```
|
|
scripts/generate_compass.py → outputs/
|
|
├── compass_2025.html ← main compass (latest window)
|
|
├── trajectories_2019_2025.html ← party drift over time
|
|
└── compass_2024-Q4.html ← quarterly detail
|
|
```
|
|
|
|
Run `fig.to_html(include_plotlyjs='cdn', full_html=False)` to extract embeddable snippets, paste into `blog/stematlas.html` in the sgeboers.nl repo.
|
|
|
|
---
|
|
|
|
## Blog Post Charts — What to Include
|
|
|
|
The blog post narrates three acts. Each gets a supporting chart:
|
|
|
|
### Act 1: The Method
|
|
**No chart needed** — the SVD explanation is conceptual. Use a simple HTML table for the vote matrix illustration.
|
|
|
|
### Act 2: The Political Compass
|
|
**Chart: `compass_latest_annual.html`**
|
|
|
|
- 2D scatter of all parties for the most recent full annual window (2024 or 2025)
|
|
- Axes: PC1 (left-right) × PC2 (residual, typically progressive-traditionalist)
|
|
- Points coloured and labelled by party
|
|
- Interactive: hover shows party name + coordinates
|
|
- Caption: "Each party's position computed purely from voting patterns — no labels applied by us"
|
|
|
|
**Chart: `trajectories_all_parties.html`**
|
|
|
|
- Line chart of party positions across all annual windows (2016–2025)
|
|
- One line per party, coloured consistently
|
|
- Key narrative moments annotated: BBB arrival (2022), coalition formation (2022), Rutte → Schoof (2024)
|
|
- Interactive: toggle parties on/off via legend
|
|
|
|
### Act 3: Motion Similarity
|
|
**Chart: `compass_motions_sample.html`** (optional, depends on data quality)
|
|
|
|
- 2D UMAP scatter of ~500 sampled motions, coloured by policy area
|
|
- Shows clustering: climate motions cluster together, budget motions cluster together, etc.
|
|
- If UMAP results aren't clean enough to tell a clear story, skip this one
|
|
|
|
**Static table: Motion counts by year**
|
|
Just a markdown table in the blog post — no chart needed.
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
scheduler.py (nightly)
|
|
└── api_client → downloads new motions → DuckDB
|
|
|
|
On demand (manual or cron):
|
|
└── run_pipeline.py → SVD + embeddings + fusion + similarity cache → DuckDB
|
|
└── generate_compass.py → static HTML charts → sgeboers.nl repo (blog/stematlas.html)
|
|
|
|
Streamlit (reads only):
|
|
└── duckdb.connect(read_only=True) → all analysis queries
|
|
```
|
|
|
|
The DB is the source of truth. Charts are regenerated and re-copied to Hugo whenever the pipeline produces new data — probably monthly.
|
|
|
|
---
|
|
|
|
## Error Handling Strategy
|
|
|
|
- **Streamlit crash**: Docker `restart: unless-stopped` brings it back automatically
|
|
- **Scheduler crash**: Same restart policy; DuckDB's WAL handles partial writes
|
|
- **DB file corruption**: Not handled beyond OS-level backup. Mitigate by adding a weekly `cp data/motions.db data/motions.db.bak` to the scheduler or as a cron job on the VPS
|
|
- **Blog charts stale**: Acceptable — charts are labelled with their window date; stale by 30 days is fine for a blog post
|
|
- **Streamlit + scheduler write conflict**: Scheduler is the only writer. Streamlit and quiz sessions both use separate connections; DuckDB handles concurrent reads fine. The quiz writes `user_sessions` rows — low frequency, no conflict risk with scheduler
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
- Import smoke test for `explorer.py` already exists (`tests/test_explorer_import.py`)
|
|
- `Home.py` and `pages/` restructure needs a corresponding smoke test
|
|
- Drone build will catch import errors before deploy
|
|
- Manual verification: `docker-compose up` locally against a copy of `data/motions.db`, check all four Streamlit tabs render without error
|
|
- Blog post charts: visual review after `generate_compass.py` run — no automated test needed
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Multi-page restructure scope**: Does the quiz (`app.py`) need any changes beyond being wrapped in a `pages/` file, or can it be imported as-is? The `if __name__ == "__main__"` guard in `app.py` needs reviewing.
|
|
2. **Streamlit base path**: Subdomain approach (`stematlas.sgeboers.nl`) means no subpath complexity — Streamlit runs at `/`. Clean.
|
|
3. **Chart update cadence**: Manual (run `generate_compass.py`, extract snippets, paste into blog post HTML, push to sgeboers.nl repo). Fine initially — charts are labelled with window date.
|
|
4. **sgeboers.nl nav structure**: No blog directory exists yet. Need to add `blog/` dir, a `blog/stematlas.html` file, and a nav link on the main site. Structure TBD after inspecting the existing HTML/CSS site.
|
|
5. **Nginx already running**: Need to confirm Certbot/Let's Encrypt workflow matches what's already set up on the VPS for other subdomains.
|
|
|