You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/plans/2026-03-22-stematlas-deploy...

9.5 KiB

StemAtlas Deployment — Implementation Plan

Design: thoughts/shared/designs/2026-03-22-stematlas-deployment-design.md Date: 2026-03-22


Overview

Four independent batches. Batches A and B can run in parallel. Batch C requires the pipeline to finish first. Batch D is VPS infrastructure (manual steps, done once).

Batch A: stemwijzer repo     — Streamlit multi-page + Docker
Batch B: sgeboers.nl repo    — blog/, nav, blog post HTML skeleton
Batch C: Charts              — generate + embed (after pipeline finishes)
Batch D: VPS infrastructure  — Nginx vhost + Certbot + /srv/stematlas/

Batch A — stemwijzer repo: Streamlit multi-page + Docker

A1. Check Dockerfile

Read existing Dockerfile — verify it installs all deps from pyproject.toml and sets CMD to start the app. Note current entrypoint (probably streamlit run app.py).

A2. Create Home.py

New file at project root. Streamlit landing/about page:

  • Title: "StemAtlas"
  • Brief description of the two pages (quiz + explorer)
  • Links (Streamlit sidebar nav handles the rest automatically)
  • st.page_link() cards pointing to the two pages

A3. Create pages/1_Stemwijzer.py

Thin wrapper that imports and calls app.main():

  • Import from app import main
  • Remove the if __name__ == "__main__": main() guard from app.py (or keep it — Streamlit ignores it when the file is imported)
  • The page title shown in Streamlit nav comes from the filename: 1_Stemwijzer → "Stemwijzer"

A4. Create pages/2_Explorer.py

Same pattern:

  • Import from explorer import run_app
  • Call run_app()
  • Filename → nav label: "Explorer"

A5. Update Dockerfile CMD

Change entrypoint from streamlit run app.py to streamlit run Home.py --server.port 8501 --server.address 0.0.0.0.

A6. Create docker-compose.yml

Two services in the stemwijzer repo:

version: "3.9"
services:
  stematlas:
    image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
    ports:
      - "127.0.0.1:8501:8501"
    volumes:
      - /srv/stematlas/data:/app/data
    restart: unless-stopped
    environment:
      - DB_PATH=/app/data/motions.db

  scheduler:
    image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
    command: python scheduler.py
    volumes:
      - /srv/stematlas/data:/app/data
    restart: unless-stopped
    environment:
      - DB_PATH=/app/data/motions.db

127.0.0.1:8501 — only accessible from localhost, Nginx proxies externally.

A7. Smoke test for Home.py

Add tests/test_home_import.py — same pattern as test_explorer_import.py. Verify Home module is importable, run_app or equivalent callable exists.

A8. Run tests

.venv/bin/python -m pytest -q — all existing + new smoke tests must pass.

Verification

docker build -t stematlas-local . locally to confirm image builds without errors.


Batch B — sgeboers.nl repo: blog/ + nav

This batch requires access to the sgeboers.nl repo on git.sgeboers.nl. Steps below assume the repo is cloned locally.

B1. Inspect existing site structure

Read index.html and any existing CSS files to understand:

  • Current nav structure (header? sidebar? footer?)
  • CSS class conventions for links/sections
  • Any existing page patterns to copy for the blog post

B2. Create blog/ directory

Add blog/index.html — a minimal blog listing page:

  • Title: "Blog"
  • One entry: "StemAtlas — Mapping Dutch Democracy" → blog/stematlas.html
  • Matches existing site style

Update index.html (or whichever file contains the nav) to add a "Blog" link pointing to /blog/.

B4. Create blog/stematlas.html skeleton

Full blog post HTML based on thoughts/blog-post-political-compass.md:

  • Convert markdown to HTML (headings, paragraphs, code blocks, tables)
  • Add Plotly CDN <script> in <head>
  • Chart placeholders: <!-- CHART: compass_latest -->, <!-- CHART: trajectories --> — to be filled in Batch C
  • Add two CTAs linking to stematlas.sgeboers.nl:
    • After compass chart: "Explore every window interactively →"
    • At bottom: "Try the Stemwijzer quiz →"
  • Match existing site CSS (link the same stylesheet)

B5. Update Drone pipeline (sgeboers.nl repo)

Confirm the existing .drone.yml in sgeboers.nl picks up new files under blog/ automatically (it should, if it deploys the whole repo root). No changes needed if it's already a rsync or cp -r deploy.

Verification

Open blog/stematlas.html locally in browser — post renders correctly with placeholder chart divs, nav works.


Batch C — Charts: generate + embed (after pipeline finishes ~21:40)

Requires data/motions.db to be unlocked (pipeline complete).

C1. Run tests

.venv/bin/python -m pytest -q — confirm all pass now that DB is free.

C2. Run similarity cache recompute

.venv/bin/python -m pipeline.run_pipeline \
  --db-path data/motions.db \
  --start-date 2019-01-01 --end-date 2025-01-01 \
  --window-size quarterly \
  --skip-metadata --skip-extract --skip-svd --skip-text

Fusion only — fills fused_embeddings for new 2019–2021 and 2024 windows.

C3. Recompute similarity cache

.venv/bin/python -c "
from similarity.compute import compute_similarities
import duckdb
conn = duckdb.connect('data/motions.db', read_only=True)
windows = [r[0] for r in conn.execute(\"SELECT DISTINCT window_id FROM fused_embeddings ORDER BY 1\").fetchall()]
conn.close()
for w in windows:
    print(f'Computing {w}...')
    compute_similarities('data/motions.db', w, top_k=20)
"

C4. Generate compass HTML files

.venv/bin/python scripts/generate_compass.py \
  --db data/motions.db \
  --out outputs/blog-charts \
  --method pca --pca-residual

This produces outputs/blog-charts/compass_*.html and outputs/blog-charts/trajectories_*.html.

C5. Extract Plotly snippets

For each chart file, extract the embeddable snippet:

# Run once per chart to get embeddable HTML
import plotly.io as pio
# OR: just strip everything outside <div id="..."> and its <script>
# The generate_compass.py output is self-contained — use BeautifulSoup or
# manual extraction to get just the div+script block

Simpler: modify generate_compass.py to add a --partial flag that calls fig.to_html(include_plotlyjs=False, full_html=False) and writes .partial.html files alongside the full ones.

C6. Fill chart placeholders in blog post

Replace <!-- CHART: compass_latest --> and <!-- CHART: trajectories --> in blog/stematlas.html with the extracted Plotly div+script blocks.

C7. Update motion count table in blog post

Run SQL to get authoritative counts:

SELECT strftime(date, '%Y') AS year, COUNT(*) AS motions
FROM motions
GROUP BY year ORDER BY year;

Replace placeholder numbers in blog/stematlas.html table.

C8. Push sgeboers.nl repo

Commit and push blog/stematlas.html + blog/index.html + nav changes to git.sgeboers.nl → Drone deploys.


Batch D — VPS infrastructure (manual, one-time)

SSH into the VPS. Steps are sequential.

D1. Create data directory

sudo mkdir -p /srv/stematlas/data
sudo chown $USER:$USER /srv/stematlas/data

D2. Copy motions.db to VPS

From local machine:

rsync -avz --progress data/motions.db user@vps:/srv/stematlas/data/motions.db

~3.6GB transfer — takes a few minutes.

D3. Add Nginx vhost

New file /etc/nginx/sites-available/stematlas:

server {
    listen 80;
    server_name stematlas.sgeboers.nl;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name stematlas.sgeboers.nl;

    # Let's Encrypt certs (Certbot fills these in)
    ssl_certificate /etc/letsencrypt/live/stematlas.sgeboers.nl/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/stematlas.sgeboers.nl/privkey.pem;

    location / {
        proxy_pass http://127.0.0.1:8501;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 86400;
    }
}

Enable: sudo ln -s /etc/nginx/sites-available/stematlas /etc/nginx/sites-enabled/

D4. Get Let's Encrypt cert

sudo certbot --nginx -d stematlas.sgeboers.nl

(Assumes Certbot is already installed and working for other subdomains on this VPS.)

D5. First deploy

The Drone pipeline for the stemwijzer repo will handle future deploys. For the first deploy, either:

  • Push a commit to trigger Drone, OR
  • Manually on VPS: cd /srv/stematlas && docker-compose pull && docker-compose up -d

D6. Verify

  • https://stematlas.sgeboers.nl → Streamlit loads, shows Home.py
  • Both pages accessible from Streamlit nav
  • docker-compose logs stematlas — no errors

Dependencies Between Batches

A (stemwijzer repo)  ──► D5 (first deploy) ──► D6 (verify)
B (sgeboers.nl repo) ──► C8 (push blog)
C (charts)           ──► C8 (push blog)
D1-D4 (VPS infra)    ──► D5 (first deploy)

Pipeline finish (~21:40) ──► C1 (tests) ──► C2-C7 (charts)

Batches A and B are fully independent — can start now. Batch C waits only for the pipeline to finish. Batch D is VPS-side and independent of code changes.


Estimated Effort

Batch Tasks Est. Time
A Multi-page Streamlit + docker-compose 45 min
B Blog HTML + nav (after inspecting site) 60 min
C Charts + embed (after pipeline) 30 min
D VPS infra (manual SSH) 30 min
Total ~2.5 hours