No description
Find a file
2026-02-28 22:12:04 -05:00
static prototype 2026-02-28 22:12:04 -05:00
app.py prototype 2026-02-28 22:12:04 -05:00
README.md prototype 2026-02-28 22:12:04 -05:00
requirements.txt prototype 2026-02-28 22:12:04 -05:00
start.sh prototype 2026-02-28 22:12:04 -05:00
supertonic_gateway.py prototype 2026-02-28 22:12:04 -05:00
voice_rtc.py prototype 2026-02-28 22:12:04 -05:00
wisper.py prototype 2026-02-28 22:12:04 -05:00

Nanobot SuperTonic Wisper Web

Standalone Python web project that:

  • uses a local supertonic_gateway orchestration layer,
  • uses a local wisper event bus,
  • spawns nanobot agent in a pseudo-TTY (TUI behavior),
  • streams TUI output to a browser chat page over WebSocket,
  • supports WebRTC voice input/output with host-side STT/TTS processing.

This project is separate from the nanobot repository and only talks to Nanobot as an external command.

1) Setup

cd /home/kacper/nanobot-supertonic-wisper-web
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2) Point to your Nanobot command

Default behavior:

  • if ~/nanobot/.venv/bin/python exists, the app uses:
    • NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"
    • NANOBOT_WORKDIR="~/nanobot"
  • else if ~/nanobot/venv/bin/python exists, the app uses:
    • NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"
    • NANOBOT_WORKDIR="~/nanobot"
  • otherwise it falls back to nanobot agent --no-markdown from PATH.

Optional override (for any custom location):

export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown"
export NANOBOT_WORKDIR="/home/kacper/nanobot"

Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console):

export NANOBOT_SUPPRESS_NOISY_UI='1'
export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5'

3) Run web app

uvicorn app:app --reload --host 0.0.0.0 --port 8080

Open: http://localhost:8080

Or use the helper script (recommended for voice on iOS Safari):

./start.sh

start.sh enables HTTPS by default (ENABLE_HTTPS=1), auto-generates a local self-signed cert at .certs/local-cert.pem and key at .certs/local-key.pem, and serves https://localhost:8000. For iPhone access by LAN IP, open https://<your-lan-ip>:8000 and trust the certificate on the device. Set ENABLE_HTTPS=0 to run plain HTTP.

How it works

  • Click Spawn Nanobot TUI to start the agent process in a PTY.
  • Type messages in the input and press Enter, or click Connect Voice Channel and hold Push-to-Talk while speaking.
  • The browser receives streamed PTY output and displays it live.
  • When Host Voice Output is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio.
  • For isolated RTC/TTS debugging, connect voice and click Play Voice Test Script to synthesize a sample line directly over the same WebRTC output path.

Voice features

  • Browser voice transport uses RTCPeerConnection + microphone capture (getUserMedia).
  • Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation.
  • Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output.
  • Host receives raw audio and performs speech-to-text using:
    • faster-whisper directly by default (HOST_STT_PROVIDER=faster-whisper), or
    • HOST_STT_COMMAND (if HOST_STT_PROVIDER=command).
  • Host performs text-to-speech using:
    • supertonic Python library by default (HOST_TTS_PROVIDER=supertonic), or
    • HOST_TTS_COMMAND (if HOST_TTS_PROVIDER=command), or
    • espeak (if available in PATH).
  • Voice test mode sends a dedicated voice-test-script command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required).
  • If STT/TTS is not configured, text chat still works and system messages explain what is missing.

Optional host voice configuration

If you use ./start.sh, you can put these in .env.voice and they will be loaded automatically.

Default direct STT (faster-whisper):

export HOST_STT_PROVIDER='faster-whisper'
export HOST_STT_MODEL='base.en'
export HOST_STT_DEVICE='auto'
export HOST_STT_COMPUTE_TYPE='int8'
export HOST_STT_LANGUAGE='en'
export HOST_STT_BEAM_SIZE='2'
export HOST_STT_BEST_OF='2'
export HOST_STT_VAD_FILTER='0'
export HOST_STT_TEMPERATURE='0.0'
export HOST_STT_LOG_PROB_THRESHOLD='-1.0'
export HOST_STT_NO_SPEECH_THRESHOLD='0.6'
export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4'
export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.'
export HOST_STT_MIN_PTT_MS='220'
export HOST_STT_MAX_PTT_MS='12000'
export HOST_STT_PTT_PLAYBACK_TEST='0'
export HOST_STT_SEGMENT_QUEUE_SIZE='2'
export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0'
export HOST_STT_SUPPRESS_DURING_TTS='1'
export HOST_STT_SUPPRESS_MS_AFTER_TTS='300'

Legacy compatibility: HOST_STT_MIN_SEGMENT_MS / HOST_STT_MAX_SEGMENT_MS are still read as fallbacks.

Note: first run may download the selected Whisper model weights.

Use command-based STT instead:

export HOST_STT_PROVIDER='command'
export HOST_STT_COMMAND='whisper_cli --input {input_wav}'

Command contract:

  • {input_wav} is replaced with a temporary WAV file path.
  • Command must print transcript text to stdout.

Set TTS (optional; overrides espeak fallback):

export HOST_TTS_PROVIDER='supertonic'
export SUPERTONIC_MODEL='supertonic-2'
export SUPERTONIC_VOICE_STYLE='M1'
export SUPERTONIC_LANG='en'
export SUPERTONIC_INTRA_OP_THREADS='1'
export SUPERTONIC_INTER_OP_THREADS='1'
export HOST_TTS_FLUSH_DELAY_S='0.45'
export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15'
export HOST_TTS_MIN_CHARS='10'
export HOST_TTS_MAX_WAIT_MS='1800'
export HOST_TTS_MAX_CHUNK_CHARS='140'
export HOST_RTC_OUTBOUND_LEAD_IN_MS='120'
export HOST_RTC_OUTBOUND_IDLE_S='0.6'

Use command-based TTS instead:

export HOST_TTS_PROVIDER='command'
export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}'

Command contract:

  • {text} is replaced with quoted text.
  • {output_wav} is replaced with a temporary WAV output path.
  • If {output_wav} is omitted, command stdout must be WAV bytes.

Files

  • app.py: FastAPI app and WebSocket endpoint.
  • voice_rtc.py: WebRTC signaling/session handling and host-side STT/TTS audio pipeline.
  • supertonic_gateway.py: process orchestration and PTY bridge.
  • wisper.py: event/message bus used by WebSocket streaming.
  • static/index.html: simple chat UI.