You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/designs/2026-03-22-stematlas-deploy...

9.5 KiB

date topic status
2026-03-22 StemAtlas — Public Deployment on sgeboers.nl validated

StemAtlas Deployment Design

Problem Statement

The stemwijzer project has three user-facing products ready to publish:

  1. A blog post explaining the political compass methodology and findings
  2. An interactive explorer (political compass, party trajectories, motion search)
  3. The stemwijzer quiz (vote on motions, see which parties match you)

These need to be deployed publicly on sgeboers.nl using the existing VPS + Gitea + Drone + Docker stack.


The Name: StemAtlas

stematlas.sgeboers.nl

Dutch wordplay: stem = vote AND voice (as in "the voice of parliament") + atlas = a comprehensive map of the world. Together: an atlas of voices — a map of how Dutch democracy sounds from the inside.

It's broader than "stemwijzer" (which implies a voting guide) — it positions the site as a data exploration and journalism tool.


Constraints

  • Existing VPS running Nginx, Gitea, Drone
  • Deployment pipeline: Docker build → push to registry → SSH docker-compose up -d
  • sgeboers.nl is a raw HTML/CSS site (not Hugo) hosted as a repo on git.sgeboers.nl
  • DuckDB file lives on the VPS — single writer (scheduler), multiple readers (Streamlit)
  • No new cloud services or hosting costs

Architecture

Internet
  │
  ├── sgeboers.nl (raw HTML/CSS site, existing repo on git.sgeboers.nl)
  │     └── blog/stematlas.html  ← blog post with inline charts + link to subdomain
  │
  └── stematlas.sgeboers.nl
        └── Nginx (reverse proxy)
              └── Streamlit multi-page app (port 8501)
                    ├── Page 1: Stemwijzer Quiz (app.py)
                    └── Page 2: Explorer (explorer.py)

VPS filesystem:
  /srv/stematlas/
    ├── data/motions.db        ← DuckDB (shared, read-write by scheduler)
    └── docker-compose.yml

Components

1. Streamlit Multi-Page App

Restructure entry point from app.pyHome.py with a pages/ directory:

Home.py                  ← landing page / about
pages/
  1_Stemwijzer.py        ← quiz (app.py content)
  2_Explorer.py          ← explorer.py content

Streamlit's built-in multi-page routing handles navigation. One Docker container, one port (8501).

Why not two separate containers?
Single shared DuckDB file on VPS filesystem. Both pages open read-only connections (quiz opens read-write for session data, but that's the existing behaviour). One container = one volume mount = no coordination overhead.

2. Docker Compose

The existing .drone.yml already calls docker-compose up -d on the VPS. We add/update docker-compose.yml:

Services:
  stematlas:
    image: registry/stematlas:latest
    ports: 8501 (internal only)
    volumes:
      - /srv/stematlas/data:/app/data   ← persistent DB
    restart: unless-stopped

  scheduler:
    image: registry/stematlas:latest
    command: python scheduler.py
    volumes:
      - /srv/stematlas/data:/app/data   ← same DB, write access
    restart: unless-stopped

Scheduler as a sidecar: runs in the same image but different container, keeps DB updated nightly. Streamlit container never writes to DB (except user sessions in the quiz).

3. Nginx Vhost

New server block on the VPS:

stematlas.sgeboers.nl → proxy_pass http://127.0.0.1:8501

Standard Streamlit proxy requirements: proxy_http_version 1.1, WebSocket upgrade headers for /_stcore/stream. Let's Encrypt cert via Certbot (standard pattern).

4. Drone CI Pipeline Update

Existing .drone.yml steps remain identical — build, push, SSH deploy. The only change: docker-compose.yml in the repo now references both the stematlas and scheduler services, so docker-compose up -d picks them both up.

No new Drone secrets needed if DOCKER_REGISTRY, DEPLOY_HOST etc. are already set.

5. Blog Post (Raw HTML page on sgeboers.nl)

The blog post is a new blog/stematlas.html file added to the sgeboers.nl repo on git.sgeboers.nl. The Drone pipeline for that repo deploys it like any other static file — push to git, Drone copies to webroot, Nginx serves it.

Chart embedding strategy — inline Plotly divs:

Rather than iframes, we extract just the chart <div> + <script> from generate_compass.py's output (using fig.to_html(include_plotlyjs='cdn', full_html=False)) and paste them directly into the blog post HTML. This is cleaner than iframes — no border, no scroll issues, full-width, loads with the page.

Plotly CDN script included once in the <head>. Each chart is just a <div id="chart-N"> + a <script> block below it.

Linking to the subdomain:

The blog post is the article — it tells the story with static charts. The subdomain is the playground. The post links to stematlas.sgeboers.nl at two natural moments:

  • After the political compass chart: "Explore every window interactively →"
  • At the end: "Take the quiz yourself →"

This is the right split: blog post brings readers in via search/sharing, subdomain gives them something to do.

Chart generation workflow:

scripts/generate_compass.py → outputs/
  ├── compass_2025.html         ← main compass (latest window)
  ├── trajectories_2019_2025.html  ← party drift over time
  └── compass_2024-Q4.html      ← quarterly detail

Run fig.to_html(include_plotlyjs='cdn', full_html=False) to extract embeddable snippets, paste into blog/stematlas.html in the sgeboers.nl repo.


Blog Post Charts — What to Include

The blog post narrates three acts. Each gets a supporting chart:

Act 1: The Method

No chart needed — the SVD explanation is conceptual. Use a simple HTML table for the vote matrix illustration.

Act 2: The Political Compass

Chart: compass_latest_annual.html

  • 2D scatter of all parties for the most recent full annual window (2024 or 2025)
  • Axes: PC1 (left-right) × PC2 (residual, typically progressive-traditionalist)
  • Points coloured and labelled by party
  • Interactive: hover shows party name + coordinates
  • Caption: "Each party's position computed purely from voting patterns — no labels applied by us"

Chart: trajectories_all_parties.html

  • Line chart of party positions across all annual windows (2016–2025)
  • One line per party, coloured consistently
  • Key narrative moments annotated: BBB arrival (2022), coalition formation (2022), Rutte → Schoof (2024)
  • Interactive: toggle parties on/off via legend

Act 3: Motion Similarity

Chart: compass_motions_sample.html (optional, depends on data quality)

  • 2D UMAP scatter of ~500 sampled motions, coloured by policy area
  • Shows clustering: climate motions cluster together, budget motions cluster together, etc.
  • If UMAP results aren't clean enough to tell a clear story, skip this one

Static table: Motion counts by year Just a markdown table in the blog post — no chart needed.


Data Flow

scheduler.py (nightly)
  └── api_client → downloads new motions → DuckDB

On demand (manual or cron):
  └── run_pipeline.py → SVD + embeddings + fusion + similarity cache → DuckDB
  └── generate_compass.py → static HTML charts → sgeboers.nl repo (blog/stematlas.html)

Streamlit (reads only):
  └── duckdb.connect(read_only=True) → all analysis queries

The DB is the source of truth. Charts are regenerated and re-copied to Hugo whenever the pipeline produces new data — probably monthly.


Error Handling Strategy

  • Streamlit crash: Docker restart: unless-stopped brings it back automatically
  • Scheduler crash: Same restart policy; DuckDB's WAL handles partial writes
  • DB file corruption: Not handled beyond OS-level backup. Mitigate by adding a weekly cp data/motions.db data/motions.db.bak to the scheduler or as a cron job on the VPS
  • Blog charts stale: Acceptable — charts are labelled with their window date; stale by 30 days is fine for a blog post
  • Streamlit + scheduler write conflict: Scheduler is the only writer. Streamlit and quiz sessions both use separate connections; DuckDB handles concurrent reads fine. The quiz writes user_sessions rows — low frequency, no conflict risk with scheduler

Testing Strategy

  • Import smoke test for explorer.py already exists (tests/test_explorer_import.py)
  • Home.py and pages/ restructure needs a corresponding smoke test
  • Drone build will catch import errors before deploy
  • Manual verification: docker-compose up locally against a copy of data/motions.db, check all four Streamlit tabs render without error
  • Blog post charts: visual review after generate_compass.py run — no automated test needed

Open Questions

  1. Multi-page restructure scope: Does the quiz (app.py) need any changes beyond being wrapped in a pages/ file, or can it be imported as-is? The if __name__ == "__main__" guard in app.py needs reviewing.
  2. Streamlit base path: Subdomain approach (stematlas.sgeboers.nl) means no subpath complexity — Streamlit runs at /. Clean.
  3. Chart update cadence: Manual (run generate_compass.py, extract snippets, paste into blog post HTML, push to sgeboers.nl repo). Fine initially — charts are labelled with window date.
  4. sgeboers.nl nav structure: No blog directory exists yet. Need to add blog/ dir, a blog/stematlas.html file, and a nav link on the main site. Structure TBD after inspecting the existing HTML/CSS site.
  5. Nginx already running: Need to confirm Certbot/Let's Encrypt workflow matches what's already set up on the VPS for other subdomains.