No description

Find a file

kacper 8534b15c20 prototype		2026-02-28 22:12:04 -05:00
static	prototype	2026-02-28 22:12:04 -05:00
app.py	prototype	2026-02-28 22:12:04 -05:00
README.md	prototype	2026-02-28 22:12:04 -05:00
requirements.txt	prototype	2026-02-28 22:12:04 -05:00
start.sh	prototype	2026-02-28 22:12:04 -05:00
supertonic_gateway.py	prototype	2026-02-28 22:12:04 -05:00
voice_rtc.py	prototype	2026-02-28 22:12:04 -05:00
wisper.py	prototype	2026-02-28 22:12:04 -05:00

README.md

Nanobot SuperTonic Wisper Web

Standalone Python web project that:

uses a local supertonic_gateway orchestration layer,
uses a local wisper event bus,
spawns nanobot agent in a pseudo-TTY (TUI behavior),
streams TUI output to a browser chat page over WebSocket,
supports WebRTC voice input/output with host-side STT/TTS processing.

This project is separate from the nanobot repository and only talks to Nanobot as an external command.

1) Setup

cd /home/kacper/nanobot-supertonic-wisper-web
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) Point to your Nanobot command

Default behavior:

if ~/nanobot/.venv/bin/python exists, the app uses:
- NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"
- NANOBOT_WORKDIR="~/nanobot"
else if ~/nanobot/venv/bin/python exists, the app uses:
- NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"
- NANOBOT_WORKDIR="~/nanobot"
otherwise it falls back to nanobot agent --no-markdown from PATH.

Optional override (for any custom location):

export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown"
export NANOBOT_WORKDIR="/home/kacper/nanobot"

Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console):

export NANOBOT_SUPPRESS_NOISY_UI='1'
export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5'

3) Run web app

uvicorn app:app --reload --host 0.0.0.0 --port 8080

Open: http://localhost:8080

Or use the helper script (recommended for voice on iOS Safari):

./start.sh

start.sh enables HTTPS by default (ENABLE_HTTPS=1), auto-generates a local self-signed cert at .certs/local-cert.pem and key at .certs/local-key.pem, and serves https://localhost:8000. For iPhone access by LAN IP, open https://<your-lan-ip>:8000 and trust the certificate on the device. Set ENABLE_HTTPS=0 to run plain HTTP.

How it works

Click Spawn Nanobot TUI to start the agent process in a PTY.
Type messages in the input and press Enter, or click Connect Voice Channel and hold Push-to-Talk while speaking.
The browser receives streamed PTY output and displays it live.
When Host Voice Output is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio.
For isolated RTC/TTS debugging, connect voice and click Play Voice Test Script to synthesize a sample line directly over the same WebRTC output path.

Voice features

Browser voice transport uses RTCPeerConnection + microphone capture (getUserMedia).
Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation.
Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output.
Host receives raw audio and performs speech-to-text using:
- faster-whisper directly by default (HOST_STT_PROVIDER=faster-whisper), or
- HOST_STT_COMMAND (if HOST_STT_PROVIDER=command).
Host performs text-to-speech using:
- supertonic Python library by default (HOST_TTS_PROVIDER=supertonic), or
- HOST_TTS_COMMAND (if HOST_TTS_PROVIDER=command), or
- espeak (if available in PATH).
Voice test mode sends a dedicated voice-test-script command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required).
If STT/TTS is not configured, text chat still works and system messages explain what is missing.

Optional host voice configuration

If you use ./start.sh, you can put these in .env.voice and they will be loaded automatically.

Default direct STT (faster-whisper):

export HOST_STT_PROVIDER='faster-whisper'
export HOST_STT_MODEL='base.en'
export HOST_STT_DEVICE='auto'
export HOST_STT_COMPUTE_TYPE='int8'
export HOST_STT_LANGUAGE='en'
export HOST_STT_BEAM_SIZE='2'
export HOST_STT_BEST_OF='2'
export HOST_STT_VAD_FILTER='0'
export HOST_STT_TEMPERATURE='0.0'
export HOST_STT_LOG_PROB_THRESHOLD='-1.0'
export HOST_STT_NO_SPEECH_THRESHOLD='0.6'
export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4'
export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.'
export HOST_STT_MIN_PTT_MS='220'
export HOST_STT_MAX_PTT_MS='12000'
export HOST_STT_PTT_PLAYBACK_TEST='0'
export HOST_STT_SEGMENT_QUEUE_SIZE='2'
export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0'
export HOST_STT_SUPPRESS_DURING_TTS='1'
export HOST_STT_SUPPRESS_MS_AFTER_TTS='300'

Legacy compatibility: HOST_STT_MIN_SEGMENT_MS / HOST_STT_MAX_SEGMENT_MS are still read as fallbacks.

Note: first run may download the selected Whisper model weights.

Use command-based STT instead:

export HOST_STT_PROVIDER='command'
export HOST_STT_COMMAND='whisper_cli --input {input_wav}'

Command contract:

{input_wav} is replaced with a temporary WAV file path.
Command must print transcript text to stdout.

Set TTS (optional; overrides espeak fallback):

export HOST_TTS_PROVIDER='supertonic'
export SUPERTONIC_MODEL='supertonic-2'
export SUPERTONIC_VOICE_STYLE='M1'
export SUPERTONIC_LANG='en'
export SUPERTONIC_INTRA_OP_THREADS='1'
export SUPERTONIC_INTER_OP_THREADS='1'
export HOST_TTS_FLUSH_DELAY_S='0.45'
export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15'
export HOST_TTS_MIN_CHARS='10'
export HOST_TTS_MAX_WAIT_MS='1800'
export HOST_TTS_MAX_CHUNK_CHARS='140'
export HOST_RTC_OUTBOUND_LEAD_IN_MS='120'
export HOST_RTC_OUTBOUND_IDLE_S='0.6'

Use command-based TTS instead:

export HOST_TTS_PROVIDER='command'
export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}'

Command contract:

{text} is replaced with quoted text.
{output_wav} is replaced with a temporary WAV output path.
If {output_wav} is omitted, command stdout must be WAV bytes.

Files

app.py: FastAPI app and WebSocket endpoint.
voice_rtc.py: WebRTC signaling/session handling and host-side STT/TTS audio pipeline.
supertonic_gateway.py: process orchestration and PTY bridge.
wisper.py: event/message bus used by WebSocket streaming.
static/index.html: simple chat UI.