nanobot-voice-interface/README.md

# Nanobot SuperTonic Wisper Web

Standalone Python web project that:
- uses a local `supertonic_gateway` orchestration layer,
- uses a local `wisper` event bus,
- spawns `nanobot agent` in a pseudo-TTY (TUI behavior),
- streams TUI output to a browser chat page over WebSocket,
- supports WebRTC voice input/output with host-side STT/TTS processing.

This project is separate from the `nanobot` repository and only talks to Nanobot as an external command.

## 1) Setup

```bash
cd /home/kacper/nanobot-supertonic-wisper-web
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## 2) Point to your Nanobot command

Default behavior:
- if `~/nanobot/.venv/bin/python` exists, the app uses:
  - `NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"`
  - `NANOBOT_WORKDIR="~/nanobot"`
- else if `~/nanobot/venv/bin/python` exists, the app uses:
  - `NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"`
  - `NANOBOT_WORKDIR="~/nanobot"`
- otherwise it falls back to `nanobot agent --no-markdown` from PATH.

Optional override (for any custom location):

```bash
export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown"
export NANOBOT_WORKDIR="/home/kacper/nanobot"
```

Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console):

```bash
export NANOBOT_SUPPRESS_NOISY_UI='1'
export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5'
```

## 3) Run web app

```bash
uvicorn app:app --reload --host 0.0.0.0 --port 8080
```

Open: `http://localhost:8080`

Or use the helper script (recommended for voice on iOS Safari):

```bash
./start.sh
```

`start.sh` enables HTTPS by default (`ENABLE_HTTPS=1`), auto-generates a local self-signed cert at `.certs/local-cert.pem` and key at `.certs/local-key.pem`, and serves `https://localhost:8000`.
For iPhone access by LAN IP, open `https://<your-lan-ip>:8000` and trust the certificate on the device.
Set `ENABLE_HTTPS=0` to run plain HTTP.

## How it works

- Click **Spawn Nanobot TUI** to start the agent process in a PTY.
- Type messages in the input and press Enter, or click **Connect Voice Channel** and hold **Push-to-Talk** while speaking.
- The browser receives streamed PTY output and displays it live.
- When **Host Voice Output** is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio.
- For isolated RTC/TTS debugging, connect voice and click **Play Voice Test Script** to synthesize a sample line directly over the same WebRTC output path.

## Voice features

- Browser voice transport uses `RTCPeerConnection` + microphone capture (`getUserMedia`).
- Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation.
- Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output.
- Host receives raw audio and performs speech-to-text using:
  - `faster-whisper` directly by default (`HOST_STT_PROVIDER=faster-whisper`), or
  - `HOST_STT_COMMAND` (if `HOST_STT_PROVIDER=command`).
- Host performs text-to-speech using:
  - `supertonic` Python library by default (`HOST_TTS_PROVIDER=supertonic`), or
  - `HOST_TTS_COMMAND` (if `HOST_TTS_PROVIDER=command`), or
  - `espeak` (if available in PATH).
- Voice test mode sends a dedicated `voice-test-script` command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required).
- If STT/TTS is not configured, text chat still works and system messages explain what is missing.

### Optional host voice configuration

If you use `./start.sh`, you can put these in `.env.voice` and they will be loaded automatically.

Default direct STT (faster-whisper):

```bash
export HOST_STT_PROVIDER='faster-whisper'
export HOST_STT_MODEL='base.en'
export HOST_STT_DEVICE='auto'
export HOST_STT_COMPUTE_TYPE='int8'
export HOST_STT_LANGUAGE='en'
export HOST_STT_BEAM_SIZE='2'
export HOST_STT_BEST_OF='2'
export HOST_STT_VAD_FILTER='0'
export HOST_STT_TEMPERATURE='0.0'
export HOST_STT_LOG_PROB_THRESHOLD='-1.0'
export HOST_STT_NO_SPEECH_THRESHOLD='0.6'
export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4'
export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.'
export HOST_STT_MIN_PTT_MS='220'
export HOST_STT_MAX_PTT_MS='12000'
export HOST_STT_PTT_PLAYBACK_TEST='0'
export HOST_STT_SEGMENT_QUEUE_SIZE='2'
export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0'
export HOST_STT_SUPPRESS_DURING_TTS='1'
export HOST_STT_SUPPRESS_MS_AFTER_TTS='300'
```

Legacy compatibility: `HOST_STT_MIN_SEGMENT_MS` / `HOST_STT_MAX_SEGMENT_MS` are still read as fallbacks.

Note: first run may download the selected Whisper model weights.

Use command-based STT instead:

```bash
export HOST_STT_PROVIDER='command'
export HOST_STT_COMMAND='whisper_cli --input {input_wav}'
```

Command contract:
- `{input_wav}` is replaced with a temporary WAV file path.
- Command must print transcript text to stdout.

Set TTS (optional; overrides `espeak` fallback):

```bash
export HOST_TTS_PROVIDER='supertonic'
export SUPERTONIC_MODEL='supertonic-2'
export SUPERTONIC_VOICE_STYLE='M1'
export SUPERTONIC_LANG='en'
export SUPERTONIC_INTRA_OP_THREADS='1'
export SUPERTONIC_INTER_OP_THREADS='1'
export HOST_TTS_FLUSH_DELAY_S='0.45'
export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15'
export HOST_TTS_MIN_CHARS='10'
export HOST_TTS_MAX_WAIT_MS='1800'
export HOST_TTS_MAX_CHUNK_CHARS='140'
export HOST_RTC_OUTBOUND_LEAD_IN_MS='120'
export HOST_RTC_OUTBOUND_IDLE_S='0.6'
```

Use command-based TTS instead:

```bash
export HOST_TTS_PROVIDER='command'
export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}'
```

Command contract:
- `{text}` is replaced with quoted text.
- `{output_wav}` is replaced with a temporary WAV output path.
- If `{output_wav}` is omitted, command stdout must be WAV bytes.

## Files

- `app.py`: FastAPI app and WebSocket endpoint.
- `voice_rtc.py`: WebRTC signaling/session handling and host-side STT/TTS audio pipeline.
- `supertonic_gateway.py`: process orchestration and PTY bridge.
- `wisper.py`: event/message bus used by WebSocket streaming.
- `static/index.html`: simple chat UI.
prototype 2026-02-28 22:12:04 -05:00			`# Nanobot SuperTonic Wisper Web`

			`Standalone Python web project that:`
			- uses a local `supertonic_gateway` orchestration layer,
			- uses a local `wisper` event bus,
			- spawns `nanobot agent` in a pseudo-TTY (TUI behavior),
			`- streams TUI output to a browser chat page over WebSocket,`
			`- supports WebRTC voice input/output with host-side STT/TTS processing.`

			This project is separate from the `nanobot` repository and only talks to Nanobot as an external command.

			`## 1) Setup`

			```bash
			`cd /home/kacper/nanobot-supertonic-wisper-web`
			`python3 -m venv .venv`
			`source .venv/bin/activate`
			`pip install -r requirements.txt`
			```

			`## 2) Point to your Nanobot command`

			`Default behavior:`
			- if `~/nanobot/.venv/bin/python` exists, the app uses:
			- `NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"`
			- `NANOBOT_WORKDIR="~/nanobot"`
			- else if `~/nanobot/venv/bin/python` exists, the app uses:
			- `NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"`
			- `NANOBOT_WORKDIR="~/nanobot"`
			- otherwise it falls back to `nanobot agent --no-markdown` from PATH.

			`Optional override (for any custom location):`

			```bash
			`export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown"`
			`export NANOBOT_WORKDIR="/home/kacper/nanobot"`
			```

			`Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console):`

			```bash
			`export NANOBOT_SUPPRESS_NOISY_UI='1'`
			`export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5'`
			```

			`## 3) Run web app`

			```bash
			`uvicorn app:app --reload --host 0.0.0.0 --port 8080`
			```

			Open: `http://localhost:8080`

			`Or use the helper script (recommended for voice on iOS Safari):`

			```bash
			`./start.sh`
			```

			`start.sh` enables HTTPS by default (`ENABLE_HTTPS=1`), auto-generates a local self-signed cert at `.certs/local-cert.pem` and key at `.certs/local-key.pem`, and serves `https://localhost:8000`.
			For iPhone access by LAN IP, open `https://<your-lan-ip>:8000` and trust the certificate on the device.
			Set `ENABLE_HTTPS=0` to run plain HTTP.

			`## How it works`

			`- Click Spawn Nanobot TUI to start the agent process in a PTY.`
			`- Type messages in the input and press Enter, or click Connect Voice Channel and hold Push-to-Talk while speaking.`
			`- The browser receives streamed PTY output and displays it live.`
			`- When Host Voice Output is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio.`
			`- For isolated RTC/TTS debugging, connect voice and click Play Voice Test Script to synthesize a sample line directly over the same WebRTC output path.`

			`## Voice features`

			- Browser voice transport uses `RTCPeerConnection` + microphone capture (`getUserMedia`).
			`- Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation.`
			`- Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output.`
			`- Host receives raw audio and performs speech-to-text using:`
			- `faster-whisper` directly by default (`HOST_STT_PROVIDER=faster-whisper`), or
			- `HOST_STT_COMMAND` (if `HOST_STT_PROVIDER=command`).
			`- Host performs text-to-speech using:`
			- `supertonic` Python library by default (`HOST_TTS_PROVIDER=supertonic`), or
			- `HOST_TTS_COMMAND` (if `HOST_TTS_PROVIDER=command`), or
			- `espeak` (if available in PATH).
			- Voice test mode sends a dedicated `voice-test-script` command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required).
			`- If STT/TTS is not configured, text chat still works and system messages explain what is missing.`

			`### Optional host voice configuration`

			If you use `./start.sh`, you can put these in `.env.voice` and they will be loaded automatically.

			`Default direct STT (faster-whisper):`

			```bash
			`export HOST_STT_PROVIDER='faster-whisper'`
			`export HOST_STT_MODEL='base.en'`
			`export HOST_STT_DEVICE='auto'`
			`export HOST_STT_COMPUTE_TYPE='int8'`
			`export HOST_STT_LANGUAGE='en'`
			`export HOST_STT_BEAM_SIZE='2'`
			`export HOST_STT_BEST_OF='2'`
			`export HOST_STT_VAD_FILTER='0'`
			`export HOST_STT_TEMPERATURE='0.0'`
			`export HOST_STT_LOG_PROB_THRESHOLD='-1.0'`
			`export HOST_STT_NO_SPEECH_THRESHOLD='0.6'`
			`export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4'`
			`export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.'`
			`export HOST_STT_MIN_PTT_MS='220'`
			`export HOST_STT_MAX_PTT_MS='12000'`
			`export HOST_STT_PTT_PLAYBACK_TEST='0'`
			`export HOST_STT_SEGMENT_QUEUE_SIZE='2'`
			`export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0'`
			`export HOST_STT_SUPPRESS_DURING_TTS='1'`
			`export HOST_STT_SUPPRESS_MS_AFTER_TTS='300'`
			```

			Legacy compatibility: `HOST_STT_MIN_SEGMENT_MS` / `HOST_STT_MAX_SEGMENT_MS` are still read as fallbacks.

			`Note: first run may download the selected Whisper model weights.`

			`Use command-based STT instead:`

			```bash
			`export HOST_STT_PROVIDER='command'`
			`export HOST_STT_COMMAND='whisper_cli --input {input_wav}'`
			```

			`Command contract:`
			- `{input_wav}` is replaced with a temporary WAV file path.
			`- Command must print transcript text to stdout.`

			Set TTS (optional; overrides `espeak` fallback):

			```bash
			`export HOST_TTS_PROVIDER='supertonic'`
			`export SUPERTONIC_MODEL='supertonic-2'`
			`export SUPERTONIC_VOICE_STYLE='M1'`
			`export SUPERTONIC_LANG='en'`
			`export SUPERTONIC_INTRA_OP_THREADS='1'`
			`export SUPERTONIC_INTER_OP_THREADS='1'`
			`export HOST_TTS_FLUSH_DELAY_S='0.45'`
			`export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15'`
			`export HOST_TTS_MIN_CHARS='10'`
			`export HOST_TTS_MAX_WAIT_MS='1800'`
			`export HOST_TTS_MAX_CHUNK_CHARS='140'`
			`export HOST_RTC_OUTBOUND_LEAD_IN_MS='120'`
			`export HOST_RTC_OUTBOUND_IDLE_S='0.6'`
			```

			`Use command-based TTS instead:`

			```bash
			`export HOST_TTS_PROVIDER='command'`
			`export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}'`
			```

			`Command contract:`
			- `{text}` is replaced with quoted text.
			- `{output_wav}` is replaced with a temporary WAV output path.
			- If `{output_wav}` is omitted, command stdout must be WAV bytes.

			`## Files`

			- `app.py`: FastAPI app and WebSocket endpoint.
			- `voice_rtc.py`: WebRTC signaling/session handling and host-side STT/TTS audio pipeline.
			- `supertonic_gateway.py`: process orchestration and PTY bridge.
			- `wisper.py`: event/message bus used by WebSocket streaming.
			- `static/index.html`: simple chat UI.