You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
motief/thoughts/shared/plans/2026-03-22-stematlas-deploy...

286 lines
9.5 KiB

# StemAtlas Deployment — Implementation Plan
**Design:** `thoughts/shared/designs/2026-03-22-stematlas-deployment-design.md`
**Date:** 2026-03-22
---
## Overview
Four independent batches. Batches A and B can run in parallel. Batch C requires the pipeline to finish first. Batch D is VPS infrastructure (manual steps, done once).
```
Batch A: stemwijzer repo — Streamlit multi-page + Docker
Batch B: sgeboers.nl repo — blog/, nav, blog post HTML skeleton
Batch C: Charts — generate + embed (after pipeline finishes)
Batch D: VPS infrastructure — Nginx vhost + Certbot + /srv/stematlas/
```
---
## Batch A — stemwijzer repo: Streamlit multi-page + Docker
### A1. Check Dockerfile
Read existing `Dockerfile` — verify it installs all deps from `pyproject.toml` and sets `CMD` to start the app. Note current entrypoint (probably `streamlit run app.py`).
### A2. Create `Home.py`
New file at project root. Streamlit landing/about page:
- Title: "StemAtlas"
- Brief description of the two pages (quiz + explorer)
- Links (Streamlit sidebar nav handles the rest automatically)
- `st.page_link()` cards pointing to the two pages
### A3. Create `pages/1_Stemwijzer.py`
Thin wrapper that imports and calls `app.main()`:
- Import `from app import main`
- Remove the `if __name__ == "__main__": main()` guard from `app.py` (or keep it — Streamlit ignores it when the file is imported)
- The page title shown in Streamlit nav comes from the filename: `1_Stemwijzer` → "Stemwijzer"
### A4. Create `pages/2_Explorer.py`
Same pattern:
- Import `from explorer import run_app`
- Call `run_app()`
- Filename → nav label: "Explorer"
### A5. Update Dockerfile CMD
Change entrypoint from `streamlit run app.py` to `streamlit run Home.py --server.port 8501 --server.address 0.0.0.0`.
### A6. Create `docker-compose.yml`
Two services in the stemwijzer repo:
```yaml
version: "3.9"
services:
stematlas:
image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
ports:
- "127.0.0.1:8501:8501"
volumes:
- /srv/stematlas/data:/app/data
restart: unless-stopped
environment:
- DB_PATH=/app/data/motions.db
scheduler:
image: ${DOCKER_REGISTRY}/sgeboers/stemwijzer:latest
command: python scheduler.py
volumes:
- /srv/stematlas/data:/app/data
restart: unless-stopped
environment:
- DB_PATH=/app/data/motions.db
```
`127.0.0.1:8501` — only accessible from localhost, Nginx proxies externally.
### A7. Smoke test for `Home.py`
Add `tests/test_home_import.py` — same pattern as `test_explorer_import.py`. Verify `Home` module is importable, `run_app` or equivalent callable exists.
### A8. Run tests
`.venv/bin/python -m pytest -q` — all existing + new smoke tests must pass.
### Verification
`docker build -t stematlas-local .` locally to confirm image builds without errors.
---
## Batch B — sgeboers.nl repo: blog/ + nav
> This batch requires access to the sgeboers.nl repo on git.sgeboers.nl.
> Steps below assume the repo is cloned locally.
### B1. Inspect existing site structure
Read `index.html` and any existing CSS files to understand:
- Current nav structure (header? sidebar? footer?)
- CSS class conventions for links/sections
- Any existing page patterns to copy for the blog post
### B2. Create `blog/` directory
Add `blog/index.html` — a minimal blog listing page:
- Title: "Blog"
- One entry: "StemAtlas — Mapping Dutch Democracy" → `blog/stematlas.html`
- Matches existing site style
### B3. Add nav link to main site
Update `index.html` (or whichever file contains the nav) to add a "Blog" link pointing to `/blog/`.
### B4. Create `blog/stematlas.html` skeleton
Full blog post HTML based on `thoughts/blog-post-political-compass.md`:
- Convert markdown to HTML (headings, paragraphs, code blocks, tables)
- Add Plotly CDN `<script>` in `<head>`
- **Chart placeholders**: `<!-- CHART: compass_latest -->`, `<!-- CHART: trajectories -->` — to be filled in Batch C
- Add two CTAs linking to `stematlas.sgeboers.nl`:
- After compass chart: *"Explore every window interactively →"*
- At bottom: *"Try the Stemwijzer quiz →"*
- Match existing site CSS (link the same stylesheet)
### B5. Update Drone pipeline (sgeboers.nl repo)
Confirm the existing `.drone.yml` in sgeboers.nl picks up new files under `blog/` automatically (it should, if it deploys the whole repo root). No changes needed if it's already a `rsync` or `cp -r` deploy.
### Verification
Open `blog/stematlas.html` locally in browser — post renders correctly with placeholder chart divs, nav works.
---
## Batch C — Charts: generate + embed (after pipeline finishes ~21:40)
> Requires `data/motions.db` to be unlocked (pipeline complete).
### C1. Run tests
`.venv/bin/python -m pytest -q` — confirm all pass now that DB is free.
### C2. Run similarity cache recompute
```
.venv/bin/python -m pipeline.run_pipeline \
--db-path data/motions.db \
--start-date 2019-01-01 --end-date 2025-01-01 \
--window-size quarterly \
--skip-metadata --skip-extract --skip-svd --skip-text
```
Fusion only — fills `fused_embeddings` for new 2019–2021 and 2024 windows.
### C3. Recompute similarity cache
```
.venv/bin/python -c "
from similarity.compute import compute_similarities
import duckdb
conn = duckdb.connect('data/motions.db', read_only=True)
windows = [r[0] for r in conn.execute(\"SELECT DISTINCT window_id FROM fused_embeddings ORDER BY 1\").fetchall()]
conn.close()
for w in windows:
print(f'Computing {w}...')
compute_similarities('data/motions.db', w, top_k=20)
"
```
### C4. Generate compass HTML files
```
.venv/bin/python scripts/generate_compass.py \
--db data/motions.db \
--out outputs/blog-charts \
--method pca --pca-residual
```
This produces `outputs/blog-charts/compass_*.html` and `outputs/blog-charts/trajectories_*.html`.
### C5. Extract Plotly snippets
For each chart file, extract the embeddable snippet:
```python
# Run once per chart to get embeddable HTML
import plotly.io as pio
# OR: just strip everything outside <div id="..."> and its <script>
# The generate_compass.py output is self-contained — use BeautifulSoup or
# manual extraction to get just the div+script block
```
Simpler: modify `generate_compass.py` to add a `--partial` flag that calls `fig.to_html(include_plotlyjs=False, full_html=False)` and writes `.partial.html` files alongside the full ones.
### C6. Fill chart placeholders in blog post
Replace `<!-- CHART: compass_latest -->` and `<!-- CHART: trajectories -->` in `blog/stematlas.html` with the extracted Plotly div+script blocks.
### C7. Update motion count table in blog post
Run SQL to get authoritative counts:
```sql
SELECT strftime(date, '%Y') AS year, COUNT(*) AS motions
FROM motions
GROUP BY year ORDER BY year;
```
Replace placeholder numbers in `blog/stematlas.html` table.
### C8. Push sgeboers.nl repo
Commit and push `blog/stematlas.html` + `blog/index.html` + nav changes to git.sgeboers.nl → Drone deploys.
---
## Batch D — VPS infrastructure (manual, one-time)
> SSH into the VPS. Steps are sequential.
### D1. Create data directory
```bash
sudo mkdir -p /srv/stematlas/data
sudo chown $USER:$USER /srv/stematlas/data
```
### D2. Copy `motions.db` to VPS
From local machine:
```bash
rsync -avz --progress data/motions.db user@vps:/srv/stematlas/data/motions.db
```
~3.6GB transfer — takes a few minutes.
### D3. Add Nginx vhost
New file `/etc/nginx/sites-available/stematlas`:
```nginx
server {
listen 80;
server_name stematlas.sgeboers.nl;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name stematlas.sgeboers.nl;
# Let's Encrypt certs (Certbot fills these in)
ssl_certificate /etc/letsencrypt/live/stematlas.sgeboers.nl/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/stematlas.sgeboers.nl/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8501;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 86400;
}
}
```
Enable: `sudo ln -s /etc/nginx/sites-available/stematlas /etc/nginx/sites-enabled/`
### D4. Get Let's Encrypt cert
```bash
sudo certbot --nginx -d stematlas.sgeboers.nl
```
(Assumes Certbot is already installed and working for other subdomains on this VPS.)
### D5. First deploy
The Drone pipeline for the stemwijzer repo will handle future deploys. For the first deploy, either:
- Push a commit to trigger Drone, OR
- Manually on VPS: `cd /srv/stematlas && docker-compose pull && docker-compose up -d`
### D6. Verify
- `https://stematlas.sgeboers.nl` → Streamlit loads, shows Home.py
- Both pages accessible from Streamlit nav
- `docker-compose logs stematlas` — no errors
---
## Dependencies Between Batches
```
A (stemwijzer repo) ──► D5 (first deploy) ──► D6 (verify)
B (sgeboers.nl repo) ──► C8 (push blog)
C (charts) ──► C8 (push blog)
D1-D4 (VPS infra) ──► D5 (first deploy)
Pipeline finish (~21:40) ──► C1 (tests) ──► C2-C7 (charts)
```
Batches A and B are fully independent — can start now.
Batch C waits only for the pipeline to finish.
Batch D is VPS-side and independent of code changes.
---
## Estimated Effort
| Batch | Tasks | Est. Time |
|-------|-------|-----------|
| A | Multi-page Streamlit + docker-compose | 45 min |
| B | Blog HTML + nav (after inspecting site) | 60 min |
| C | Charts + embed (after pipeline) | 30 min |
| D | VPS infra (manual SSH) | 30 min |
| **Total** | | **~2.5 hours** |