From 133b557512793aad5fed2c8b4b685e48faea4970 Mon Sep 17 00:00:00 2001 From: kacper Date: Tue, 3 Mar 2026 10:57:40 -0500 Subject: [PATCH] first commit --- README.md | 167 ------------------------------------------------------ 1 file changed, 167 deletions(-) delete mode 100644 README.md diff --git a/README.md b/README.md deleted file mode 100644 index 96480f8..0000000 --- a/README.md +++ /dev/null @@ -1,167 +0,0 @@ -# Nanobot SuperTonic Wisper Web - -Standalone Python web project that: -- uses a local `supertonic_gateway` orchestration layer, -- uses a local `wisper` event bus, -- spawns `nanobot agent` in a pseudo-TTY (TUI behavior), -- streams TUI output to a browser chat page over WebSocket, -- supports WebRTC voice input/output with host-side STT/TTS processing. - -This project is separate from the `nanobot` repository and only talks to Nanobot as an external command. - -## 1) Setup - -```bash -cd /home/kacper/nanobot-supertonic-wisper-web -python3 -m venv .venv -source .venv/bin/activate -pip install -r requirements.txt -``` - -## 2) Point to your Nanobot command - -Default behavior: -- if `~/nanobot/.venv/bin/python` exists, the app uses: - - `NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"` - - `NANOBOT_WORKDIR="~/nanobot"` -- else if `~/nanobot/venv/bin/python` exists, the app uses: - - `NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"` - - `NANOBOT_WORKDIR="~/nanobot"` -- otherwise it falls back to `nanobot agent --no-markdown` from PATH. - -Optional override (for any custom location): - -```bash -export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown" -export NANOBOT_WORKDIR="/home/kacper/nanobot" -``` - -Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console): - -```bash -export NANOBOT_SUPPRESS_NOISY_UI='1' -export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5' -``` - -## 3) Run web app - -```bash -uvicorn app:app --reload --host 0.0.0.0 --port 8080 -``` - -Open: `http://localhost:8080` - -Or use the helper script (recommended for voice on iOS Safari): - -```bash -./start.sh -``` - -`start.sh` enables HTTPS by default (`ENABLE_HTTPS=1`), auto-generates a local self-signed cert at `.certs/local-cert.pem` and key at `.certs/local-key.pem`, and serves `https://localhost:8000`. -For iPhone access by LAN IP, open `https://:8000` and trust the certificate on the device. -Set `ENABLE_HTTPS=0` to run plain HTTP. - -## How it works - -- Click **Spawn Nanobot TUI** to start the agent process in a PTY. -- Type messages in the input and press Enter, or click **Connect Voice Channel** and hold **Push-to-Talk** while speaking. -- The browser receives streamed PTY output and displays it live. -- When **Host Voice Output** is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio. -- For isolated RTC/TTS debugging, connect voice and click **Play Voice Test Script** to synthesize a sample line directly over the same WebRTC output path. - -## Voice features - -- Browser voice transport uses `RTCPeerConnection` + microphone capture (`getUserMedia`). -- Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation. -- Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output. -- Host receives raw audio and performs speech-to-text using: - - `faster-whisper` directly by default (`HOST_STT_PROVIDER=faster-whisper`), or - - `HOST_STT_COMMAND` (if `HOST_STT_PROVIDER=command`). -- Host performs text-to-speech using: - - `supertonic` Python library by default (`HOST_TTS_PROVIDER=supertonic`), or - - `HOST_TTS_COMMAND` (if `HOST_TTS_PROVIDER=command`), or - - `espeak` (if available in PATH). -- Voice test mode sends a dedicated `voice-test-script` command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required). -- If STT/TTS is not configured, text chat still works and system messages explain what is missing. - -### Optional host voice configuration - -If you use `./start.sh`, you can put these in `.env.voice` and they will be loaded automatically. - -Default direct STT (faster-whisper): - -```bash -export HOST_STT_PROVIDER='faster-whisper' -export HOST_STT_MODEL='base.en' -export HOST_STT_DEVICE='auto' -export HOST_STT_COMPUTE_TYPE='int8' -export HOST_STT_LANGUAGE='en' -export HOST_STT_BEAM_SIZE='2' -export HOST_STT_BEST_OF='2' -export HOST_STT_VAD_FILTER='0' -export HOST_STT_TEMPERATURE='0.0' -export HOST_STT_LOG_PROB_THRESHOLD='-1.0' -export HOST_STT_NO_SPEECH_THRESHOLD='0.6' -export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4' -export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.' -export HOST_STT_MIN_PTT_MS='220' -export HOST_STT_MAX_PTT_MS='12000' -export HOST_STT_PTT_PLAYBACK_TEST='0' -export HOST_STT_SEGMENT_QUEUE_SIZE='2' -export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0' -export HOST_STT_SUPPRESS_DURING_TTS='1' -export HOST_STT_SUPPRESS_MS_AFTER_TTS='300' -``` - -Legacy compatibility: `HOST_STT_MIN_SEGMENT_MS` / `HOST_STT_MAX_SEGMENT_MS` are still read as fallbacks. - -Note: first run may download the selected Whisper model weights. - -Use command-based STT instead: - -```bash -export HOST_STT_PROVIDER='command' -export HOST_STT_COMMAND='whisper_cli --input {input_wav}' -``` - -Command contract: -- `{input_wav}` is replaced with a temporary WAV file path. -- Command must print transcript text to stdout. - -Set TTS (optional; overrides `espeak` fallback): - -```bash -export HOST_TTS_PROVIDER='supertonic' -export SUPERTONIC_MODEL='supertonic-2' -export SUPERTONIC_VOICE_STYLE='M1' -export SUPERTONIC_LANG='en' -export SUPERTONIC_INTRA_OP_THREADS='1' -export SUPERTONIC_INTER_OP_THREADS='1' -export HOST_TTS_FLUSH_DELAY_S='0.45' -export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15' -export HOST_TTS_MIN_CHARS='10' -export HOST_TTS_MAX_WAIT_MS='1800' -export HOST_TTS_MAX_CHUNK_CHARS='140' -export HOST_RTC_OUTBOUND_LEAD_IN_MS='120' -export HOST_RTC_OUTBOUND_IDLE_S='0.6' -``` - -Use command-based TTS instead: - -```bash -export HOST_TTS_PROVIDER='command' -export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}' -``` - -Command contract: -- `{text}` is replaced with quoted text. -- `{output_wav}` is replaced with a temporary WAV output path. -- If `{output_wav}` is omitted, command stdout must be WAV bytes. - -## Files - -- `app.py`: FastAPI app and WebSocket endpoint. -- `voice_rtc.py`: WebRTC signaling/session handling and host-side STT/TTS audio pipeline. -- `supertonic_gateway.py`: process orchestration and PTY bridge. -- `wisper.py`: event/message bus used by WebSocket streaming. -- `static/index.html`: simple chat UI.