robot-u-site/AGENTS.md
kacper 9049d367ea
Some checks failed
CI / check (push) Successful in 18s
CI / deploy (push) Failing after 34s
Add CI deployment to app LXC
2026-04-15 06:28:30 -04:00

412 lines
16 KiB
Markdown

# Robot U Site Agent Guide
## Purpose
This repository contains the Robot U community site.
It is a thin application layer over Forgejo:
- Forgejo is the source of truth for authentication, public content repos, and issue-backed discussions.
- This app provides the web UI, course/lesson browsing, markdown rendering, and ICS calendar ingestion.
- The current live Forgejo instance is `https://aksal.cloud`.
## Stack
- Backend: FastAPI
- Frontend: Preact + TypeScript + Vite
- Python tooling: `uv`, `ruff`
- Frontend tooling: `bun`, Biome
## Important Files
- `app.py`: FastAPI app and SPA/static serving
- `live_prototype.py`: live payload assembly for courses, lessons, discussions, and events
- `prototype_cache.py`: server-side cache for the public Forgejo content payload
- `update_events.py`: in-process SSE broker for content update notifications
- `forgejo_client.py`: Forgejo API client
- `calendar_feeds.py`: ICS/webcal feed loading and parsing
- `settings.py`: env-driven runtime settings
- `frontend/src/App.tsx`: client routes and page composition
- `frontend/src/MarkdownContent.tsx`: safe markdown renderer used in lessons and discussions
- `scripts/start.sh`: main startup command for local runs
## Repo Layout Notes
- The root repository is the site application.
- `examples/quadrature-encoder-course/` is a separate nested git repo used as sample content. It is intentionally ignored by the root repo and should stay that way.
## First-Time Setup
### Python
```bash
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
```
### Frontend
```bash
cd frontend
~/.bun/bin/bun install
```
## Environment
Runtime configuration is loaded from shell env, then `.env`, then `.env.local` through `scripts/start.sh`.
Recommended local flow:
```bash
cp .env.example .env
```
Useful variables:
- `FORGEJO_BASE_URL=https://aksal.cloud`
- `APP_BASE_URL=http://kacper-dev-pod:8800`
- `AUTH_SECRET_KEY=...`
- `AUTH_COOKIE_SECURE=false`
- `CORS_ALLOW_ORIGINS=http://kacper-dev-pod:8800`
- `FORGEJO_OAUTH_CLIENT_ID=...`
- `FORGEJO_OAUTH_CLIENT_SECRET=...`
- `FORGEJO_OAUTH_SCOPES=openid profile`
- `FORGEJO_TOKEN=...`
- `FORGEJO_GENERAL_DISCUSSION_REPO=Robot-U/general_forum`
- `FORGEJO_WEBHOOK_SECRET=...`
- `FORGEJO_CACHE_TTL_SECONDS=60.0`
- `CALENDAR_FEED_URLS=webcal://...`
- `HOST=0.0.0.0`
- `PORT=8800`
Notes:
- Browser sign-in uses Forgejo OAuth/OIDC. `APP_BASE_URL` must match the URL opened in the browser, `CORS_ALLOW_ORIGINS` should include that origin, and the Forgejo OAuth app must include `/api/auth/forgejo/callback` under that base URL.
- Browser OAuth requests only identity scopes. The backend stores the resulting Forgejo token in an encrypted `HttpOnly` cookie and may use it only after enforcing public-repository checks for writes.
- `FORGEJO_TOKEN` is optional and should be treated as a read-only local fallback for the public content cache. Browser sessions and API token calls may write issues/comments only after verifying the target repo is public.
- `/api/prototype` uses a server-side cache for public Forgejo content. `FORGEJO_CACHE_TTL_SECONDS=0` disables it; successful discussion replies invalidate it.
- General discussion creation requires `FORGEJO_GENERAL_DISCUSSION_REPO`. Linked discussions are created in the content repo and include canonical app URLs in the Forgejo issue body.
- Forgejo webhooks should POST to `/api/forgejo/webhook`; when `FORGEJO_WEBHOOK_SECRET` is set, the backend validates Forgejo/Gitea-style HMAC headers.
- API clients can query with `Authorization: token ...` or `Authorization: Bearer ...`.
- `CALENDAR_FEED_URLS` is optional and accepts comma-separated `webcal://` or `https://` ICS feeds.
- Do not commit `.env`, `.env.local`, or `.env.proxmox`.
## Main Start Command
Use this for the normal local app flow:
```bash
./scripts/start.sh
```
What it does:
1. Loads `.env` and `.env.local` if present.
2. Builds the frontend with `bun`.
3. Starts FastAPI with `uvicorn`.
Override host/port when needed:
```bash
HOST=0.0.0.0 PORT=8800 ./scripts/start.sh
```
## Deployment Commands
Bootstrap Forgejo Actions SSH clone credentials:
```bash
export FORGEJO_API_TOKEN=...
./scripts/bootstrap_ci_clone_key.py
```
Bootstrap or rotate the Forgejo Actions LXC deploy credentials:
```bash
export FORGEJO_API_TOKEN=...
./scripts/bootstrap_lxc_deploy_key.py
```
Validate production environment before starting:
```bash
./scripts/check_deploy_config.py
```
Container deployment:
```bash
docker compose up --build -d
curl -fsS http://127.0.0.1:8800/health
```
Non-container production start after building `frontend/dist`:
```bash
HOST=0.0.0.0 PORT=8000 ./scripts/run_prod.sh
```
## Current Proxmox Deployment
Current app host:
- Proxmox node: `proxmox`
- LXC VMID: `108`
- LXC hostname: `robotu-app`
- LXC IP: `192.168.1.220/24`
- LXC gateway: `192.168.1.2`
- LXC DNS: `192.168.1.2`
- SSH target: `root@192.168.1.220`
- App directory on LXC: `/opt/robot-u-site`
- Public runtime URL: `https://discourse.onl`
- Internal app URL: `http://192.168.1.220:8800`
- Compose service: `robot-u-site`
- Container port mapping: host `8800` to container `8000`
- Reverse proxy: LXC `102` routes `discourse.onl` to `192.168.1.220:8800`
The local `.env.proxmox` file contains Proxmox credentials and LXC settings. It is ignored by git and must not be printed, committed, or copied into the app container.
The deployed app uses `/opt/robot-u-site/.env` on the LXC. That file contains Forgejo OAuth settings, `AUTH_SECRET_KEY`, optional `FORGEJO_TOKEN` for the server-side public content cache, calendar feeds, and the deployed `APP_BASE_URL`. Treat it as secret material and do not print values.
The current deployed OAuth redirect URI is:
```text
https://discourse.onl/api/auth/forgejo/callback
```
Forgejo OAuth sign-in from the public URL requires that exact callback URL to be allowed in the Forgejo OAuth app.
Important deployment notes:
- The LXC was initially created with gateway/DNS `192.168.1.1`, but this network uses `192.168.1.2`. If package installs hang or outbound network fails, check `ip route` and `/etc/resolv.conf` first.
- Proxmox persistent LXC config was updated so `net0` uses `gw=192.168.1.2`, and nameserver is `192.168.1.2`.
- Docker inside the unprivileged LXC requires Proxmox features `nesting=1,keyctl=1`; those are set on the current container.
- Ubuntu package installs were made reliable by adding `/etc/apt/apt.conf.d/99force-ipv4` with `Acquire::ForceIPv4 "true";`.
- The current LXC has `512MiB` memory and `512MiB` swap. It runs the app, but large builds or future services may need more memory.
- `FORGEJO_TOKEN` is needed server-side if anonymous Forgejo API discovery returns no content. Without that token, `/api/prototype` can return zero courses/posts/discussions even though the app is healthy.
Useful checks:
```bash
ssh root@192.168.1.220 'cd /opt/robot-u-site && docker compose ps'
curl -fsS http://192.168.1.220:8800/health
curl -fsS https://discourse.onl/health
curl -fsS https://discourse.onl/api/prototype
```
Manual redeploy to the current LXC:
```bash
ssh root@192.168.1.220 'mkdir -p /opt/robot-u-site'
rsync -az --delete \
--exclude='.git/' \
--exclude='.venv/' \
--exclude='__pycache__/' \
--exclude='.pytest_cache/' \
--exclude='.ruff_cache/' \
--exclude='.env' \
--exclude='.env.*' \
--exclude='frontend/node_modules/' \
--exclude='frontend/dist/' \
--exclude='frontend/.vite/' \
--exclude='examples/quadrature-encoder-course/' \
./ root@192.168.1.220:/opt/robot-u-site/
ssh root@192.168.1.220 'cd /opt/robot-u-site && ./scripts/check_deploy_config.py && docker compose up --build -d'
curl -fsS http://192.168.1.220:8800/health
```
Do not overwrite `/opt/robot-u-site/.env` during rsync. Update it deliberately when runtime config changes.
Current production env notes:
- `/opt/robot-u-site/.env` should use `APP_BASE_URL=https://discourse.onl`.
- `AUTH_COOKIE_SECURE=true` is required for the public HTTPS site.
- `CORS_ALLOW_ORIGINS=https://discourse.onl` is the current public origin.
- A pre-domain backup exists on the app LXC at `/opt/robot-u-site/.env.backup.20260415T101957Z`.
CI state:
- `.forgejo/workflows/ci.yml` runs on `docker`.
- The `check` job manually installs `CI_REPO_SSH_KEY`, clones `git@aksal.cloud:Robot-U/robot-u-site.git`, installs `uv` and Bun, then runs Python and frontend checks.
- The `deploy` job runs after `check` on `push` events, installs `DEPLOY_SSH_KEY`, clones the repo, rsyncs it to `root@192.168.1.220:/opt/robot-u-site/`, rebuilds Docker Compose, and checks `/health`.
- The repo has a read-only deploy key and matching Forgejo Actions secret for CI clone.
- The app LXC has a CI deploy public key in `root`'s `authorized_keys`, and the matching private key is stored in the Forgejo Actions secret `DEPLOY_SSH_KEY`.
- `scripts/bootstrap_lxc_deploy_key.py` recreates or rotates the LXC deploy key. It uses `FORGEJO_API_TOKEN`, appends the generated public key to the LXC user's `authorized_keys`, verifies SSH, and stores the generated private key in `DEPLOY_SSH_KEY`.
- The deploy rsync excludes `.env` and `.env.*`, so production runtime secrets and backups on `/opt/robot-u-site` are preserved.
## Reverse Proxy LXC 102
The reverse proxy host is Proxmox LXC `102`:
- LXC hostname: `reverse-proxy`
- LXC IP: `192.168.1.203/24`
- Gateway: `192.168.1.2`
- Main jobs: nginx reverse proxy, LiteLLM proxy, and custom Porkbun DDNS script
- nginx service: `nginx.service`
- LiteLLM service: `litellm.service`
- Porkbun service: `porkbun-ddns.service`
- Robot U public site: `discourse.onl`
- Robot U nginx config: `/etc/nginx/sites-available/discourse.onl`
- Robot U certificate: `/etc/letsencrypt/live/discourse.onl/`
- Robot U upstream: `http://192.168.1.220:8800`
Do not bundle unrelated maintenance. If asked to update LiteLLM, do not change nginx or Porkbun DNS config unless explicitly requested. As of the last LiteLLM update, `porkbun-ddns.service` was failed and was intentionally left untouched.
The `discourse.onl` nginx site was created on April 15, 2026 following the existing `aksal.cloud` pattern:
```bash
nginx -t && systemctl reload nginx
certbot --nginx -d discourse.onl --redirect --non-interactive
```
Certbot issued a Let's Encrypt certificate expiring on July 14, 2026. Validate the route with:
```bash
curl -fsS https://discourse.onl/health
curl -fsS -o /tmp/discourse-home.html -w '%{http_code} %{content_type}\n' https://discourse.onl/
```
`curl -I https://discourse.onl/` returns `405` because the FastAPI app does not handle `HEAD`; use GET-based checks instead.
The `discourse.onl` Porkbun DDNS copy is intentionally separate from the existing `aksal.*` setup:
- Script directory: `/opt/porkbun-ddns-discourse-onl`
- Service user/group: `porkbun-discourse:porkbun-discourse`
- Service: `porkbun-ddns-discourse-onl.service`
- Timer: `porkbun-ddns-discourse-onl.timer`
- Managed records: `A discourse.onl` and `A *.discourse.onl`
- Current managed IP as of setup: `64.30.74.112`
The `discourse.onl` copy of `updateDNS.sh` was patched locally to make Porkbun curl calls use `--fail` and stronger retries, preventing transient 503 HTML bodies from being concatenated with JSON. A PR with the same fix was opened against the upstream Porkbun DDNS repo: `https://aksal.cloud/Amargius_Commons/porkbun_ddns_script/pulls/1`.
Direct SSH to `root@192.168.1.203`, `litellm@192.168.1.203`, or `root@192.168.1.200` may not work from this workspace. If SSH fails, use the Proxmox API credentials in the ignored `.env.proxmox` file to open a Proxmox node terminal and run `pct exec 102 -- ...`.
Proxmox API terminal access pattern:
1. Read `.env.proxmox`; never print credentials.
2. `POST /api2/json/access/ticket` with the Proxmox username/password.
3. `POST /api2/json/nodes/proxmox/termproxy` using the returned ticket and CSRF token.
4. Connect to `wss://<proxmox-host>:8006/api2/json/nodes/proxmox/vncwebsocket?port=<port>&vncticket=<ticket>`.
5. Send binary login payload `root@pam:<term-ticket>\n`; expect `OK`.
6. Send shell commands through the xterm websocket protocol: command payloads are framed as `0:<byte-length>:<command>`, followed by `0:1:\n`.
7. Prefer adding a unique sentinel to each command so the runner can detect completion instead of treating websocket read timeouts as command failure.
Useful discovery commands from the Proxmox node shell:
```bash
pct status 102
pct config 102
pct exec 102 -- bash -lc 'hostname; systemctl list-units --type=service --all --no-pager | grep -Ei "lite|llm|nginx|porkbun|dns"'
pct exec 102 -- bash -lc 'systemctl status litellm --no-pager; systemctl cat litellm --no-pager'
```
LiteLLM current layout:
- Service unit: `/etc/systemd/system/litellm.service`
- Service user/group: `litellm:litellm`
- Working directory: `/opt/litellm/`
- Virtualenv: `/opt/litellm/venv`
- Config file: `/opt/litellm/config.yaml`
- Service command: `/opt/litellm/venv/bin/litellm --config /opt/litellm/config.yaml --port 4000`
- Local liveliness check: `http://127.0.0.1:4000/health/liveliness`
- Local readiness check: `http://127.0.0.1:4000/health/readiness`
LiteLLM update checklist:
1. Inspect current state and versions.
```bash
pct exec 102 -- bash -lc '/opt/litellm/venv/bin/python -m pip show litellm; curl -fsS -m 5 http://127.0.0.1:4000/health/liveliness'
```
2. Back up config and installed package set.
```bash
pct exec 102 -- bash -lc 'set -euo pipefail; stamp=$(date -u +%Y%m%dT%H%M%SZ); mkdir -p /opt/litellm/backups; cp -a /opt/litellm/config.yaml /opt/litellm/backups/config.yaml.$stamp; /opt/litellm/venv/bin/python -m pip freeze > /opt/litellm/backups/pip-freeze.$stamp.txt; chown -R litellm:litellm /opt/litellm/backups'
```
3. Stop LiteLLM before upgrading. Container `102` has only `512MiB` RAM and tends to use swap; stopping the proxy keeps pip from competing with the running process.
```bash
pct exec 102 -- bash -lc 'systemctl stop litellm; systemctl is-active litellm || true'
```
4. Upgrade pip and LiteLLM as the `litellm` user.
```bash
pct exec 102 -- bash -lc 'set -euo pipefail; runuser -u litellm -- /opt/litellm/venv/bin/python -m pip install --upgrade pip; runuser -u litellm -- /opt/litellm/venv/bin/python -m pip install --upgrade "litellm[proxy]"'
```
5. Restart and verify.
```bash
pct exec 102 -- bash -lc 'set -euo pipefail; systemctl start litellm; sleep 8; systemctl is-active litellm; /opt/litellm/venv/bin/python -m pip show litellm | sed -n "1,8p"; curl -fsS -m 10 http://127.0.0.1:4000/health/liveliness; echo; curl -fsS -m 10 http://127.0.0.1:4000/health/readiness; echo; /opt/litellm/venv/bin/python -m pip check; systemctl show litellm -p ActiveState -p SubState -p NRestarts -p MainPID -p ExecMainStatus --no-pager'
```
After the April 15, 2026 update, LiteLLM was upgraded from `1.81.15` to `1.83.7`, `/health/liveliness` returned `"I'm alive!"`, `/health/readiness` reported `db=connected`, and `pip check` reported no broken requirements. Startup logs may briefly print `Unable to connect to DB. DATABASE_URL found in environment, but prisma package not found.`; treat readiness and the Prisma process/import check as the source of truth before deciding it is an actual failure.
## Development Commands
### Backend only
```bash
.venv/bin/python -m uvicorn app:app --reload
```
### Frontend only
```bash
cd frontend
~/.bun/bin/bun run dev
```
### Frontend production build
```bash
cd frontend
~/.bun/bin/bun run build
```
## Quality Checks
Run both before pushing:
```bash
./scripts/check_python_quality.sh
./scripts/check_frontend_quality.sh
```
## Product/Data Model Background
- Public non-fork repos are scanned.
- A repo with `/lessons/` is treated as a course repo.
- A repo with `/blogs/` is treated as a post repo.
- Lessons are discovered from `lessons/<chapter>/<lesson>/`.
- Each lesson folder is expected to contain one markdown file plus optional assets.
- Frontmatter is used when present for `title` and `summary`.
- Discussions are loaded from Forgejo issues and comments.
- Issue bodies are scanned for canonical post/lesson URLs and Forgejo file URLs to connect discussions back to content.
- Calendar events are loaded from ICS feeds, not managed in-app.
## UI Expectations
- The UI should not expose Forgejo as a user-facing implementation detail unless necessary for debugging.
- Course cards should open course pages.
- Lesson rows should open lesson pages.
- Discussion pages should focus on one thread at a time.
- Markdown should render as readable content, not raw source.
## Push Workflow
The site source repo currently lives at:
- `git@aksal.cloud:Robot-U/robot-u-site.git`
Typical push flow:
```bash
git status
git add ...
git commit -m "..."
git push origin main
```