9.5 KiB
StemAtlas Deployment — Implementation Plan
Design: thoughts/shared/designs/2026-03-22-stematlas-deployment-design.md
Date: 2026-03-22
Overview
Four independent batches. Batches A and B can run in parallel. Batch C requires the pipeline to finish first. Batch D is VPS infrastructure (manual steps, done once).
Batch A: stemwijzer repo — Streamlit multi-page + Docker
Batch B: sgeboers.nl repo — blog/, nav, blog post HTML skeleton
Batch C: Charts — generate + embed (after pipeline finishes)
Batch D: VPS infrastructure — Nginx vhost + Certbot + /srv/stematlas/
Batch A — stemwijzer repo: Streamlit multi-page + Docker
A1. Check Dockerfile
Read existing Dockerfile — verify it installs all deps from pyproject.toml and sets CMD to start the app. Note current entrypoint (probably streamlit run app.py).
A2. Create Home.py
New file at project root. Streamlit landing/about page:
- Title: "StemAtlas"
- Brief description of the two pages (quiz + explorer)
- Links (Streamlit sidebar nav handles the rest automatically)
st.page_link()cards pointing to the two pages
A3. Create pages/1_Stemwijzer.py
Thin wrapper that imports and calls app.main():
- Import
from app import main - Remove the
if __name__ == "__main__": main()guard fromapp.py(or keep it — Streamlit ignores it when the file is imported) - The page title shown in Streamlit nav comes from the filename:
1_Stemwijzer→ "Stemwijzer"
A4. Create pages/2_Explorer.py
Same pattern:
- Import
from explorer import run_app - Call
run_app() - Filename → nav label: "Explorer"
A5. Update Dockerfile CMD
Change entrypoint from streamlit run app.py to streamlit run Home.py --server.port 8501 --server.address 0.0.0.0.
A6. Create docker-compose.yml
Two services in the stemwijzer repo:
version: "3.9"
services:
stematlas:
image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
ports:
- "127.0.0.1:8501:8501"
volumes:
- /srv/stematlas/data:/app/data
restart: unless-stopped
environment:
- DB_PATH=/app/data/motions.db
scheduler:
image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
command: python scheduler.py
volumes:
- /srv/stematlas/data:/app/data
restart: unless-stopped
environment:
- DB_PATH=/app/data/motions.db
127.0.0.1:8501 — only accessible from localhost, Nginx proxies externally.
A7. Smoke test for Home.py
Add tests/test_home_import.py — same pattern as test_explorer_import.py. Verify Home module is importable, run_app or equivalent callable exists.
A8. Run tests
.venv/bin/python -m pytest -q — all existing + new smoke tests must pass.
Verification
docker build -t stematlas-local . locally to confirm image builds without errors.
Batch B — sgeboers.nl repo: blog/ + nav
This batch requires access to the sgeboers.nl repo on git.sgeboers.nl. Steps below assume the repo is cloned locally.
B1. Inspect existing site structure
Read index.html and any existing CSS files to understand:
- Current nav structure (header? sidebar? footer?)
- CSS class conventions for links/sections
- Any existing page patterns to copy for the blog post
B2. Create blog/ directory
Add blog/index.html — a minimal blog listing page:
- Title: "Blog"
- One entry: "StemAtlas — Mapping Dutch Democracy" →
blog/stematlas.html - Matches existing site style
B3. Add nav link to main site
Update index.html (or whichever file contains the nav) to add a "Blog" link pointing to /blog/.
B4. Create blog/stematlas.html skeleton
Full blog post HTML based on thoughts/blog-post-political-compass.md:
- Convert markdown to HTML (headings, paragraphs, code blocks, tables)
- Add Plotly CDN
<script>in<head> - Chart placeholders:
<!-- CHART: compass_latest -->,<!-- CHART: trajectories -->— to be filled in Batch C - Add two CTAs linking to
stematlas.sgeboers.nl:- After compass chart: "Explore every window interactively →"
- At bottom: "Try the Stemwijzer quiz →"
- Match existing site CSS (link the same stylesheet)
B5. Update Drone pipeline (sgeboers.nl repo)
Confirm the existing .drone.yml in sgeboers.nl picks up new files under blog/ automatically (it should, if it deploys the whole repo root). No changes needed if it's already a rsync or cp -r deploy.
Verification
Open blog/stematlas.html locally in browser — post renders correctly with placeholder chart divs, nav works.
Batch C — Charts: generate + embed (after pipeline finishes ~21:40)
Requires
data/motions.dbto be unlocked (pipeline complete).
C1. Run tests
.venv/bin/python -m pytest -q — confirm all pass now that DB is free.
C2. Run similarity cache recompute
.venv/bin/python -m pipeline.run_pipeline \
--db-path data/motions.db \
--start-date 2019-01-01 --end-date 2025-01-01 \
--window-size quarterly \
--skip-metadata --skip-extract --skip-svd --skip-text
Fusion only — fills fused_embeddings for new 2019–2021 and 2024 windows.
C3. Recompute similarity cache
.venv/bin/python -c "
from similarity.compute import compute_similarities
import duckdb
conn = duckdb.connect('data/motions.db', read_only=True)
windows = [r[0] for r in conn.execute(\"SELECT DISTINCT window_id FROM fused_embeddings ORDER BY 1\").fetchall()]
conn.close()
for w in windows:
print(f'Computing {w}...')
compute_similarities('data/motions.db', w, top_k=20)
"
C4. Generate compass HTML files
.venv/bin/python scripts/generate_compass.py \
--db data/motions.db \
--out outputs/blog-charts \
--method pca --pca-residual
This produces outputs/blog-charts/compass_*.html and outputs/blog-charts/trajectories_*.html.
C5. Extract Plotly snippets
For each chart file, extract the embeddable snippet:
# Run once per chart to get embeddable HTML
import plotly.io as pio
# OR: just strip everything outside <div id="..."> and its <script>
# The generate_compass.py output is self-contained — use BeautifulSoup or
# manual extraction to get just the div+script block
Simpler: modify generate_compass.py to add a --partial flag that calls fig.to_html(include_plotlyjs=False, full_html=False) and writes .partial.html files alongside the full ones.
C6. Fill chart placeholders in blog post
Replace <!-- CHART: compass_latest --> and <!-- CHART: trajectories --> in blog/stematlas.html with the extracted Plotly div+script blocks.
C7. Update motion count table in blog post
Run SQL to get authoritative counts:
SELECT strftime(date, '%Y') AS year, COUNT(*) AS motions
FROM motions
GROUP BY year ORDER BY year;
Replace placeholder numbers in blog/stematlas.html table.
C8. Push sgeboers.nl repo
Commit and push blog/stematlas.html + blog/index.html + nav changes to git.sgeboers.nl → Drone deploys.
Batch D — VPS infrastructure (manual, one-time)
SSH into the VPS. Steps are sequential.
D1. Create data directory
sudo mkdir -p /srv/stematlas/data
sudo chown $USER:$USER /srv/stematlas/data
D2. Copy motions.db to VPS
From local machine:
rsync -avz --progress data/motions.db user@vps:/srv/stematlas/data/motions.db
~3.6GB transfer — takes a few minutes.
D3. Add Nginx vhost
New file /etc/nginx/sites-available/stematlas:
server {
listen 80;
server_name stematlas.sgeboers.nl;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name stematlas.sgeboers.nl;
# Let's Encrypt certs (Certbot fills these in)
ssl_certificate /etc/letsencrypt/live/stematlas.sgeboers.nl/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/stematlas.sgeboers.nl/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8501;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 86400;
}
}
Enable: sudo ln -s /etc/nginx/sites-available/stematlas /etc/nginx/sites-enabled/
D4. Get Let's Encrypt cert
sudo certbot --nginx -d stematlas.sgeboers.nl
(Assumes Certbot is already installed and working for other subdomains on this VPS.)
D5. First deploy
The Drone pipeline for the stemwijzer repo will handle future deploys. For the first deploy, either:
- Push a commit to trigger Drone, OR
- Manually on VPS:
cd /srv/stematlas && docker-compose pull && docker-compose up -d
D6. Verify
https://stematlas.sgeboers.nl→ Streamlit loads, shows Home.py- Both pages accessible from Streamlit nav
docker-compose logs stematlas— no errors
Dependencies Between Batches
A (stemwijzer repo) ──► D5 (first deploy) ──► D6 (verify)
B (sgeboers.nl repo) ──► C8 (push blog)
C (charts) ──► C8 (push blog)
D1-D4 (VPS infra) ──► D5 (first deploy)
Pipeline finish (~21:40) ──► C1 (tests) ──► C2-C7 (charts)
Batches A and B are fully independent — can start now. Batch C waits only for the pipeline to finish. Batch D is VPS-side and independent of code changes.
Estimated Effort
| Batch | Tasks | Est. Time |
|---|---|---|
| A | Multi-page Streamlit + docker-compose | 45 min |
| B | Blog HTML + nav (after inspecting site) | 60 min |
| C | Charts + embed (after pipeline) | 30 min |
| D | VPS infra (manual SSH) | 30 min |
| Total | ~2.5 hours |