prototype

This commit is contained in:
kacper 2026-02-28 22:12:04 -05:00
commit 8534b15c20
8 changed files with 3048 additions and 0 deletions

167
README.md Normal file
View file

@ -0,0 +1,167 @@
# Nanobot SuperTonic Wisper Web
Standalone Python web project that:
- uses a local `supertonic_gateway` orchestration layer,
- uses a local `wisper` event bus,
- spawns `nanobot agent` in a pseudo-TTY (TUI behavior),
- streams TUI output to a browser chat page over WebSocket,
- supports WebRTC voice input/output with host-side STT/TTS processing.
This project is separate from the `nanobot` repository and only talks to Nanobot as an external command.
## 1) Setup
```bash
cd /home/kacper/nanobot-supertonic-wisper-web
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## 2) Point to your Nanobot command
Default behavior:
- if `~/nanobot/.venv/bin/python` exists, the app uses:
- `NANOBOT_COMMAND="~/nanobot/.venv/bin/python -m nanobot agent --no-markdown"`
- `NANOBOT_WORKDIR="~/nanobot"`
- else if `~/nanobot/venv/bin/python` exists, the app uses:
- `NANOBOT_COMMAND="~/nanobot/venv/bin/python -m nanobot agent --no-markdown"`
- `NANOBOT_WORKDIR="~/nanobot"`
- otherwise it falls back to `nanobot agent --no-markdown` from PATH.
Optional override (for any custom location):
```bash
export NANOBOT_COMMAND="/home/kacper/nanobot/venv/bin/python -m nanobot agent --no-markdown"
export NANOBOT_WORKDIR="/home/kacper/nanobot"
```
Optional TUI output filtering (reduces spinner/thinking/tool-stream flood in web console):
```bash
export NANOBOT_SUPPRESS_NOISY_UI='1'
export NANOBOT_OUTPUT_DEDUP_WINDOW_S='1.5'
```
## 3) Run web app
```bash
uvicorn app:app --reload --host 0.0.0.0 --port 8080
```
Open: `http://localhost:8080`
Or use the helper script (recommended for voice on iOS Safari):
```bash
./start.sh
```
`start.sh` enables HTTPS by default (`ENABLE_HTTPS=1`), auto-generates a local self-signed cert at `.certs/local-cert.pem` and key at `.certs/local-key.pem`, and serves `https://localhost:8000`.
For iPhone access by LAN IP, open `https://<your-lan-ip>:8000` and trust the certificate on the device.
Set `ENABLE_HTTPS=0` to run plain HTTP.
## How it works
- Click **Spawn Nanobot TUI** to start the agent process in a PTY.
- Type messages in the input and press Enter, or click **Connect Voice Channel** and hold **Push-to-Talk** while speaking.
- The browser receives streamed PTY output and displays it live.
- When **Host Voice Output** is enabled, Nanobot output is synthesized on the host and streamed back over WebRTC audio.
- For isolated RTC/TTS debugging, connect voice and click **Play Voice Test Script** to synthesize a sample line directly over the same WebRTC output path.
## Voice features
- Browser voice transport uses `RTCPeerConnection` + microphone capture (`getUserMedia`).
- Voice input is explicit push-to-talk (hold button to capture, release to transcribe) instead of host-side silence segmentation.
- Optional test mode can echo each released push-to-talk segment back to the user over WebRTC output.
- Host receives raw audio and performs speech-to-text using:
- `faster-whisper` directly by default (`HOST_STT_PROVIDER=faster-whisper`), or
- `HOST_STT_COMMAND` (if `HOST_STT_PROVIDER=command`).
- Host performs text-to-speech using:
- `supertonic` Python library by default (`HOST_TTS_PROVIDER=supertonic`), or
- `HOST_TTS_COMMAND` (if `HOST_TTS_PROVIDER=command`), or
- `espeak` (if available in PATH).
- Voice test mode sends a dedicated `voice-test-script` command over WebSocket and plays host TTS on the active WebRTC audio track (no Nanobot output required).
- If STT/TTS is not configured, text chat still works and system messages explain what is missing.
### Optional host voice configuration
If you use `./start.sh`, you can put these in `.env.voice` and they will be loaded automatically.
Default direct STT (faster-whisper):
```bash
export HOST_STT_PROVIDER='faster-whisper'
export HOST_STT_MODEL='base.en'
export HOST_STT_DEVICE='auto'
export HOST_STT_COMPUTE_TYPE='int8'
export HOST_STT_LANGUAGE='en'
export HOST_STT_BEAM_SIZE='2'
export HOST_STT_BEST_OF='2'
export HOST_STT_VAD_FILTER='0'
export HOST_STT_TEMPERATURE='0.0'
export HOST_STT_LOG_PROB_THRESHOLD='-1.0'
export HOST_STT_NO_SPEECH_THRESHOLD='0.6'
export HOST_STT_COMPRESSION_RATIO_THRESHOLD='2.4'
export HOST_STT_INITIAL_PROMPT='Transcribe brief spoken English precisely. Prefer common words over sound effects.'
export HOST_STT_MIN_PTT_MS='220'
export HOST_STT_MAX_PTT_MS='12000'
export HOST_STT_PTT_PLAYBACK_TEST='0'
export HOST_STT_SEGMENT_QUEUE_SIZE='2'
export HOST_STT_BACKLOG_NOTICE_INTERVAL_S='6.0'
export HOST_STT_SUPPRESS_DURING_TTS='1'
export HOST_STT_SUPPRESS_MS_AFTER_TTS='300'
```
Legacy compatibility: `HOST_STT_MIN_SEGMENT_MS` / `HOST_STT_MAX_SEGMENT_MS` are still read as fallbacks.
Note: first run may download the selected Whisper model weights.
Use command-based STT instead:
```bash
export HOST_STT_PROVIDER='command'
export HOST_STT_COMMAND='whisper_cli --input {input_wav}'
```
Command contract:
- `{input_wav}` is replaced with a temporary WAV file path.
- Command must print transcript text to stdout.
Set TTS (optional; overrides `espeak` fallback):
```bash
export HOST_TTS_PROVIDER='supertonic'
export SUPERTONIC_MODEL='supertonic-2'
export SUPERTONIC_VOICE_STYLE='M1'
export SUPERTONIC_LANG='en'
export SUPERTONIC_INTRA_OP_THREADS='1'
export SUPERTONIC_INTER_OP_THREADS='1'
export HOST_TTS_FLUSH_DELAY_S='0.45'
export HOST_TTS_SENTENCE_FLUSH_DELAY_S='0.15'
export HOST_TTS_MIN_CHARS='10'
export HOST_TTS_MAX_WAIT_MS='1800'
export HOST_TTS_MAX_CHUNK_CHARS='140'
export HOST_RTC_OUTBOUND_LEAD_IN_MS='120'
export HOST_RTC_OUTBOUND_IDLE_S='0.6'
```
Use command-based TTS instead:
```bash
export HOST_TTS_PROVIDER='command'
export HOST_TTS_COMMAND='my_tts --text {text} --out {output_wav}'
```
Command contract:
- `{text}` is replaced with quoted text.
- `{output_wav}` is replaced with a temporary WAV output path.
- If `{output_wav}` is omitted, command stdout must be WAV bytes.
## Files
- `app.py`: FastAPI app and WebSocket endpoint.
- `voice_rtc.py`: WebRTC signaling/session handling and host-side STT/TTS audio pipeline.
- `supertonic_gateway.py`: process orchestration and PTY bridge.
- `wisper.py`: event/message bus used by WebSocket streaming.
- `static/index.html`: simple chat UI.

103
app.py Normal file
View file

@ -0,0 +1,103 @@
import asyncio
import contextlib
import json
from pathlib import Path
from typing import Any, Awaitable, Callable
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import FileResponse, JSONResponse
from supertonic_gateway import SuperTonicGateway
from voice_rtc import WebRTCVoiceSession
BASE_DIR = Path(__file__).resolve().parent
INDEX_PATH = BASE_DIR / "static" / "index.html"
app = FastAPI(title="Nanobot SuperTonic Wisper Web")
gateway = SuperTonicGateway()
@app.get("/health")
async def health() -> JSONResponse:
return JSONResponse({"status": "ok"})
@app.get("/")
async def index() -> FileResponse:
return FileResponse(INDEX_PATH)
@app.websocket("/ws/chat")
async def websocket_chat(websocket: WebSocket) -> None:
await websocket.accept()
send_lock = asyncio.Lock()
async def safe_send_json(payload: dict[str, Any]) -> None:
async with send_lock:
await websocket.send_json(payload)
queue = await gateway.subscribe()
voice_session = WebRTCVoiceSession(gateway=gateway, send_json=safe_send_json)
sender = asyncio.create_task(_sender_loop(safe_send_json, queue, voice_session))
try:
while True:
raw_message = await websocket.receive_text()
try:
message = json.loads(raw_message)
except json.JSONDecodeError:
await safe_send_json(
{"role": "system", "text": "Invalid JSON message.", "timestamp": ""}
)
continue
msg_type = str(message.get("type", "")).strip()
if msg_type == "spawn":
await gateway.spawn_tui()
elif msg_type == "stop":
await gateway.stop_tui()
elif msg_type == "rtc-offer":
await voice_session.handle_offer(message)
elif msg_type == "rtc-ice-candidate":
await voice_session.handle_ice_candidate(message)
elif msg_type == "voice-ptt":
voice_session.set_push_to_talk_pressed(
bool(message.get("pressed", False))
)
else:
await safe_send_json(
{
"role": "system",
"text": (
"Unknown message type. Use spawn, stop, rtc-offer, "
"rtc-ice-candidate, or voice-ptt."
),
"timestamp": "",
}
)
except WebSocketDisconnect:
pass
finally:
sender.cancel()
with contextlib.suppress(asyncio.CancelledError):
await sender
await voice_session.close()
await gateway.unsubscribe(queue)
@app.on_event("shutdown")
async def on_shutdown() -> None:
await gateway.shutdown()
async def _sender_loop(
send_json: Callable[[dict[str, Any]], Awaitable[None]],
queue: asyncio.Queue,
voice_session: WebRTCVoiceSession,
) -> None:
while True:
event = await queue.get()
if event.role == "nanobot-tts":
await voice_session.queue_output_text(event.text)
continue
await send_json(event.to_dict())

5
requirements.txt Normal file
View file

@ -0,0 +1,5 @@
fastapi>=0.116.0,<1.0.0
uvicorn[standard]>=0.35.0,<1.0.0
aiortc>=1.8.0,<2.0.0
supertonic>=1.1.2,<2.0.0
faster-whisper>=1.1.0,<2.0.0

126
start.sh Executable file
View file

@ -0,0 +1,126 @@
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR"
if [[ ! -d ".venv" ]]; then
python3 -m venv .venv
fi
source .venv/bin/activate
pip install -r requirements.txt >/dev/null
# Optional local voice settings. Example file: .env.voice
if [[ -f ".env.voice" ]]; then
set -a
# shellcheck disable=SC1091
source ".env.voice"
set +a
fi
# Nanobot command defaults (prefer Nanobot's own virtualenv interpreter).
: "${NANOBOT_WORKDIR:=${HOME}/nanobot}"
if [[ -z "${NANOBOT_COMMAND:-}" ]]; then
if [[ -x "${NANOBOT_WORKDIR}/.venv/bin/python" ]]; then
NANOBOT_COMMAND="${NANOBOT_WORKDIR}/.venv/bin/python -m nanobot agent --no-markdown"
elif [[ -x "${NANOBOT_WORKDIR}/venv/bin/python" ]]; then
NANOBOT_COMMAND="${NANOBOT_WORKDIR}/venv/bin/python -m nanobot agent --no-markdown"
fi
fi
export NANOBOT_WORKDIR NANOBOT_COMMAND
: "${NANOBOT_SUPPRESS_NOISY_UI:=1}"
: "${NANOBOT_OUTPUT_DEDUP_WINDOW_S:=1.5}"
export NANOBOT_SUPPRESS_NOISY_UI NANOBOT_OUTPUT_DEDUP_WINDOW_S
# Host voice pipeline env vars (safe defaults).
: "${HOST_STT_PROVIDER:=faster-whisper}"
: "${HOST_STT_COMMAND:=}"
: "${HOST_STT_MODEL:=base.en}"
: "${HOST_STT_DEVICE:=auto}"
: "${HOST_STT_COMPUTE_TYPE:=int8}"
: "${HOST_STT_LANGUAGE:=en}"
: "${HOST_STT_BEAM_SIZE:=2}"
: "${HOST_STT_BEST_OF:=2}"
: "${HOST_STT_VAD_FILTER:=0}"
: "${HOST_STT_TEMPERATURE:=0.0}"
: "${HOST_STT_LOG_PROB_THRESHOLD:=-1.0}"
: "${HOST_STT_NO_SPEECH_THRESHOLD:=0.6}"
: "${HOST_STT_COMPRESSION_RATIO_THRESHOLD:=2.4}"
: "${HOST_STT_INITIAL_PROMPT:=Transcribe brief spoken English precisely. Prefer common words over sound effects.}"
: "${HOST_TTS_PROVIDER:=supertonic}"
: "${HOST_TTS_COMMAND:=}"
: "${SUPERTONIC_MODEL:=supertonic-2}"
: "${SUPERTONIC_VOICE_STYLE:=M1}"
: "${SUPERTONIC_LANG:=en}"
: "${SUPERTONIC_TOTAL_STEPS:=5}"
: "${SUPERTONIC_SPEED:=1.05}"
: "${SUPERTONIC_INTRA_OP_THREADS:=1}"
: "${SUPERTONIC_INTER_OP_THREADS:=1}"
: "${SUPERTONIC_AUTO_DOWNLOAD:=1}"
: "${HOST_STT_MIN_PTT_MS:=220}"
: "${HOST_STT_MAX_PTT_MS:=12000}"
: "${HOST_STT_SEGMENT_QUEUE_SIZE:=2}"
: "${HOST_STT_BACKLOG_NOTICE_INTERVAL_S:=6.0}"
: "${HOST_STT_SUPPRESS_DURING_TTS:=1}"
: "${HOST_STT_SUPPRESS_MS_AFTER_TTS:=300}"
: "${HOST_RTC_OUTBOUND_LEAD_IN_MS:=120}"
: "${HOST_RTC_OUTBOUND_IDLE_S:=0.6}"
: "${HOST_TTS_FLUSH_DELAY_S:=0.45}"
: "${HOST_TTS_SENTENCE_FLUSH_DELAY_S:=0.15}"
: "${HOST_TTS_MIN_CHARS:=10}"
: "${HOST_TTS_MAX_WAIT_MS:=1800}"
: "${HOST_TTS_MAX_CHUNK_CHARS:=140}"
export HOST_STT_PROVIDER HOST_STT_COMMAND HOST_STT_MODEL HOST_STT_DEVICE
export HOST_STT_COMPUTE_TYPE HOST_STT_LANGUAGE HOST_STT_BEAM_SIZE HOST_STT_BEST_OF HOST_STT_VAD_FILTER
export HOST_STT_TEMPERATURE HOST_STT_LOG_PROB_THRESHOLD HOST_STT_NO_SPEECH_THRESHOLD
export HOST_STT_COMPRESSION_RATIO_THRESHOLD
export HOST_STT_INITIAL_PROMPT
export HOST_TTS_PROVIDER HOST_TTS_COMMAND
export SUPERTONIC_MODEL SUPERTONIC_VOICE_STYLE SUPERTONIC_LANG
export SUPERTONIC_TOTAL_STEPS SUPERTONIC_SPEED
export SUPERTONIC_INTRA_OP_THREADS SUPERTONIC_INTER_OP_THREADS SUPERTONIC_AUTO_DOWNLOAD
export HOST_STT_MIN_PTT_MS HOST_STT_MAX_PTT_MS HOST_STT_SEGMENT_QUEUE_SIZE
export HOST_STT_BACKLOG_NOTICE_INTERVAL_S
export HOST_STT_SUPPRESS_DURING_TTS HOST_STT_SUPPRESS_MS_AFTER_TTS
export HOST_RTC_OUTBOUND_LEAD_IN_MS HOST_RTC_OUTBOUND_IDLE_S
export HOST_TTS_FLUSH_DELAY_S HOST_TTS_SENTENCE_FLUSH_DELAY_S
export HOST_TTS_MIN_CHARS HOST_TTS_MAX_WAIT_MS HOST_TTS_MAX_CHUNK_CHARS
: "${UVICORN_HOST:=0.0.0.0}"
: "${UVICORN_PORT:=8000}"
: "${ENABLE_HTTPS:=1}"
: "${SSL_DAYS:=365}"
: "${SSL_CERT_FILE:=.certs/local-cert.pem}"
: "${SSL_KEY_FILE:=.certs/local-key.pem}"
if [[ "$ENABLE_HTTPS" == "1" ]]; then
mkdir -p "$(dirname "$SSL_CERT_FILE")"
mkdir -p "$(dirname "$SSL_KEY_FILE")"
if [[ ! -f "$SSL_CERT_FILE" || ! -f "$SSL_KEY_FILE" ]]; then
LOCAL_IP="$(hostname -I 2>/dev/null | awk '{print $1}')"
SAN_ENTRIES="DNS:localhost,IP:127.0.0.1"
if [[ -n "${LOCAL_IP:-}" ]]; then
SAN_ENTRIES="${SAN_ENTRIES},IP:${LOCAL_IP}"
fi
echo "Generating local TLS certificate at '$SSL_CERT_FILE' (SAN: ${SAN_ENTRIES})"
openssl req -x509 -newkey rsa:2048 -sha256 -nodes -days "$SSL_DAYS" \
-keyout "$SSL_KEY_FILE" \
-out "$SSL_CERT_FILE" \
-subj "/CN=localhost" \
-addext "subjectAltName=${SAN_ENTRIES}" \
-addext "keyUsage=digitalSignature,keyEncipherment" \
-addext "extendedKeyUsage=serverAuth"
fi
echo "Starting HTTPS server on https://localhost:${UVICORN_PORT}"
exec uvicorn app:app --host "$UVICORN_HOST" --port "$UVICORN_PORT" \
--ssl-certfile "$SSL_CERT_FILE" \
--ssl-keyfile "$SSL_KEY_FILE"
fi
echo "Starting HTTP server on http://localhost:${UVICORN_PORT}"
exec uvicorn app:app --host "$UVICORN_HOST" --port "$UVICORN_PORT"

566
static/index.html Normal file
View file

@ -0,0 +1,566 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Nanobot Chat (SuperTonic + Wisper)</title>
<style>
:root {
--bg: #f6f8fa;
--panel: #ffffff;
--text: #1f2937;
--muted: #6b7280;
--accent: #0d9488;
--border: #d1d5db;
}
* {
box-sizing: border-box;
}
body {
margin: 0;
font-family: "SF Mono", ui-monospace, Menlo, Consolas, monospace;
background: linear-gradient(180deg, #eef6ff 0%, var(--bg) 100%);
color: var(--text);
}
.wrap {
max-width: 980px;
margin: 24px auto;
padding: 0 16px;
}
h1 {
margin: 0 0 12px;
font-size: 1.2rem;
}
.panel {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 10px;
padding: 12px;
}
.controls {
display: flex;
gap: 8px;
margin-bottom: 12px;
}
button {
border: 1px solid var(--border);
background: white;
border-radius: 8px;
padding: 8px 12px;
cursor: pointer;
}
button:disabled {
opacity: 0.6;
cursor: not-allowed;
}
button.primary {
background: var(--accent);
color: white;
border-color: var(--accent);
}
button.ptt-active {
background: #be123c;
color: white;
border-color: #be123c;
}
.log {
border: 1px solid var(--border);
border-radius: 8px;
min-height: 420px;
max-height: 420px;
overflow: auto;
padding: 10px;
background: #0b1020;
color: #d6e2ff;
white-space: pre-wrap;
}
.line {
margin-bottom: 8px;
}
.line.user {
color: #9be5ff;
}
.line.system {
color: #ffd28f;
}
.line.wisper {
color: #c4f0be;
}
.voice {
display: flex;
gap: 8px;
align-items: center;
margin-top: 8px;
}
.voice-status {
color: var(--muted);
font-size: 12px;
}
.hint {
margin-top: 10px;
color: var(--muted);
font-size: 12px;
}
@media (max-width: 700px) {
.controls,
.voice {
flex-direction: column;
align-items: stretch;
}
}
</style>
</head>
<body>
<div class="wrap">
<h1>Nanobot Web Chat (SuperTonic + Wisper)</h1>
<div class="panel">
<div class="controls">
<button id="spawnBtn" class="primary">Spawn Nanobot TUI</button>
<button id="stopBtn">Stop TUI</button>
</div>
<div id="log" class="log"></div>
<div class="voice">
<button id="recordBtn">Connect Voice Channel</button>
<button id="pttBtn" disabled>Hold to Talk</button>
<span id="voiceStatus" class="voice-status"></span>
</div>
<audio id="remoteAudio" autoplay playsinline hidden></audio>
<div class="hint">
Voice input and output run over a host WebRTC audio channel. Hold Push-to-Talk to send microphone audio for host STT.
</div>
</div>
</div>
<script>
const logEl = document.getElementById("log");
const spawnBtn = document.getElementById("spawnBtn");
const stopBtn = document.getElementById("stopBtn");
const recordBtn = document.getElementById("recordBtn");
const pttBtn = document.getElementById("pttBtn");
const voiceStatus = document.getElementById("voiceStatus");
const remoteAudio = document.getElementById("remoteAudio");
const wsProto = location.protocol === "https:" ? "wss" : "ws";
const ws = new WebSocket(`${wsProto}://${location.host}/ws/chat`);
let peerConnection = null;
let micStream = null;
let remoteStream = null;
let voiceConnected = false;
let disconnectedTimer = null;
let reconnectTimer = null;
let reconnectAttempts = 0;
let voiceDesired = false;
let connectingVoice = false;
let pttPressed = false;
let rtcAnswerApplied = false;
let pendingRemoteCandidates = [];
const MAX_RECONNECT_ATTEMPTS = 2;
const appendLine = (role, text, timestamp) => {
const line = document.createElement("div");
line.className = `line ${role || "system"}`;
const time = timestamp ? new Date(timestamp).toLocaleTimeString() : "";
line.textContent = `[${time}] ${role}: ${text}`;
logEl.appendChild(line);
logEl.scrollTop = logEl.scrollHeight;
};
const sendJson = (payload) => {
if (ws.readyState !== WebSocket.OPEN) {
appendLine("system", "Socket not ready.", new Date().toISOString());
return;
}
ws.send(JSON.stringify(payload));
};
const setVoiceState = (connected) => {
voiceConnected = connected;
recordBtn.textContent = connected ? "Disconnect Voice Channel" : "Connect Voice Channel";
pttBtn.disabled = !connected;
if (!connected) {
pttBtn.textContent = "Hold to Talk";
pttBtn.classList.remove("ptt-active");
}
};
const setMicCaptureEnabled = (enabled) => {
if (!micStream) return;
micStream.getAudioTracks().forEach((track) => {
track.enabled = enabled;
});
};
const setPushToTalkState = (pressed, notifyServer = true) => {
pttPressed = pressed;
pttBtn.textContent = pressed ? "Release to Send" : "Hold to Talk";
pttBtn.classList.toggle("ptt-active", pressed);
setMicCaptureEnabled(pressed);
if (notifyServer && ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify({ type: "voice-ptt", pressed }));
}
};
const beginPushToTalk = (event) => {
if (event) event.preventDefault();
if (!voiceConnected || !peerConnection || !micStream) {
voiceStatus.textContent = "Connect voice channel first.";
return;
}
if (pttPressed) return;
setPushToTalkState(true);
voiceStatus.textContent = "Listening while button is held...";
};
const endPushToTalk = (event) => {
if (event) event.preventDefault();
if (!pttPressed) return;
setPushToTalkState(false);
if (voiceConnected) {
voiceStatus.textContent = "Voice channel connected. Hold Push-to-Talk to speak.";
}
};
const clearReconnectTimer = () => {
if (reconnectTimer) {
clearTimeout(reconnectTimer);
reconnectTimer = null;
}
};
const scheduleReconnect = (reason, delayMs = 1200) => {
if (!voiceDesired) return;
if (voiceConnected || connectingVoice) return;
if (reconnectTimer) return;
if (reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
voiceStatus.textContent = "Voice reconnect attempts exhausted.";
return;
}
reconnectAttempts += 1;
voiceStatus.textContent = `${reason} Retrying (${reconnectAttempts}/${MAX_RECONNECT_ATTEMPTS})...`;
reconnectTimer = setTimeout(async () => {
reconnectTimer = null;
await connectVoiceChannel();
}, delayMs);
};
const stopVoiceChannel = async (statusText = "", clearDesired = false) => {
if (clearDesired) {
voiceDesired = false;
reconnectAttempts = 0;
clearReconnectTimer();
}
if (disconnectedTimer) {
clearTimeout(disconnectedTimer);
disconnectedTimer = null;
}
pendingRemoteCandidates = [];
rtcAnswerApplied = false;
setPushToTalkState(false);
if (peerConnection) {
peerConnection.ontrack = null;
peerConnection.onicecandidate = null;
peerConnection.onconnectionstatechange = null;
peerConnection.close();
peerConnection = null;
}
if (micStream) {
micStream.getTracks().forEach((track) => track.stop());
micStream = null;
}
if (remoteStream) {
remoteStream.getTracks().forEach((track) => track.stop());
remoteStream = null;
}
remoteAudio.srcObject = null;
setVoiceState(false);
if (statusText) {
voiceStatus.textContent = statusText;
}
};
const applyRtcAnswer = async (message) => {
if (!peerConnection) return;
const rawSdp = (message.sdp || "").toString();
if (!rawSdp.trim()) return;
const sdp = `${rawSdp
.replace(/\r\n/g, "\n")
.replace(/\r/g, "\n")
.split("\n")
.map((line) => line.trimEnd())
.join("\r\n")
.trim()}\r\n`;
try {
await peerConnection.setRemoteDescription({
type: message.rtcType || "answer",
sdp,
});
rtcAnswerApplied = true;
const queued = pendingRemoteCandidates;
pendingRemoteCandidates = [];
for (const candidate of queued) {
try {
await peerConnection.addIceCandidate(candidate);
} catch (candidateErr) {
appendLine("system", `Queued ICE apply error: ${candidateErr}`, new Date().toISOString());
}
}
reconnectAttempts = 0;
voiceStatus.textContent = "Voice channel negotiated.";
} catch (err) {
await stopVoiceChannel("Failed to apply WebRTC answer.");
scheduleReconnect("Failed to apply answer.");
const preview = sdp
.split(/\r\n/)
.slice(0, 6)
.join(" | ");
appendLine(
"system",
`RTC answer error: ${err}. SDP preview: ${preview}`,
new Date().toISOString()
);
}
};
const applyRtcIceCandidate = async (message) => {
if (!peerConnection) return;
if (message.candidate == null) {
if (!rtcAnswerApplied || !peerConnection.remoteDescription) {
pendingRemoteCandidates.push(null);
return;
}
try {
await peerConnection.addIceCandidate(null);
} catch (err) {
appendLine("system", `RTC ICE end error: ${err}`, new Date().toISOString());
}
return;
}
try {
if (!rtcAnswerApplied || !peerConnection.remoteDescription) {
pendingRemoteCandidates.push(message.candidate);
return;
}
await peerConnection.addIceCandidate(message.candidate);
} catch (err) {
appendLine("system", `RTC ICE error: ${err}`, new Date().toISOString());
}
};
const connectVoiceChannel = async () => {
if (voiceConnected || peerConnection || connectingVoice) return;
if (!window.RTCPeerConnection) {
voiceStatus.textContent = "WebRTC unavailable in this browser.";
return;
}
if (!navigator.mediaDevices?.getUserMedia) {
voiceStatus.textContent = "Microphone capture is unavailable.";
return;
}
if (ws.readyState !== WebSocket.OPEN) {
voiceStatus.textContent = "Socket not ready yet.";
return;
}
connectingVoice = true;
try {
clearReconnectTimer();
rtcAnswerApplied = false;
pendingRemoteCandidates = [];
try {
micStream = await navigator.mediaDevices.getUserMedia({
audio: {
channelCount: 1,
sampleRate: 48000,
sampleSize: 16,
latency: 0,
echoCancellation: true,
noiseSuppression: true,
autoGainControl: false,
},
video: false,
});
} catch (_constraintErr) {
micStream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false,
});
voiceStatus.textContent = "Using browser default microphone settings.";
}
setMicCaptureEnabled(false);
peerConnection = new RTCPeerConnection({
iceServers: [{ urls: "stun:stun.l.google.com:19302" }],
});
remoteStream = new MediaStream();
remoteAudio.srcObject = remoteStream;
peerConnection.ontrack = (event) => {
if (event.track.kind !== "audio") return;
remoteStream.addTrack(event.track);
remoteAudio.play().catch(() => {
voiceStatus.textContent = "Tap the page once to allow voice playback.";
});
};
peerConnection.onicecandidate = (event) => {
if (!event.candidate) {
sendJson({ type: "rtc-ice-candidate", candidate: null });
return;
}
sendJson({
type: "rtc-ice-candidate",
candidate: event.candidate.toJSON(),
});
};
peerConnection.onconnectionstatechange = () => {
const state = peerConnection?.connectionState || "new";
if (state === "connected") {
if (disconnectedTimer) {
clearTimeout(disconnectedTimer);
disconnectedTimer = null;
}
clearReconnectTimer();
reconnectAttempts = 0;
voiceStatus.textContent = "Voice channel connected. Hold Push-to-Talk to speak.";
return;
}
if (state === "failed" || state === "closed") {
stopVoiceChannel(`Voice channel ${state}.`);
scheduleReconnect(`Voice channel ${state}.`);
return;
}
if (state === "disconnected") {
if (disconnectedTimer) clearTimeout(disconnectedTimer);
voiceStatus.textContent = "Voice channel disconnected. Waiting to recover...";
disconnectedTimer = setTimeout(() => {
if (peerConnection?.connectionState === "disconnected") {
stopVoiceChannel("Voice channel disconnected.");
scheduleReconnect("Voice channel disconnected.");
}
}, 8000);
return;
}
voiceStatus.textContent = `Voice channel ${state}...`;
};
micStream.getAudioTracks().forEach((track) => {
peerConnection.addTrack(track, micStream);
});
setVoiceState(true);
voiceStatus.textContent = "Connecting voice channel...";
setPushToTalkState(false);
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
sendJson({
type: "rtc-offer",
sdp: offer.sdp,
rtcType: offer.type,
});
} catch (err) {
await stopVoiceChannel("Voice channel setup failed.");
scheduleReconnect("Voice setup failed.");
appendLine("system", `Voice setup error: ${err}`, new Date().toISOString());
} finally {
connectingVoice = false;
}
};
ws.onopen = () => {
appendLine("system", "WebSocket connected.", new Date().toISOString());
};
ws.onclose = async () => {
appendLine("system", "WebSocket disconnected.", new Date().toISOString());
await stopVoiceChannel("Voice channel disconnected.", true);
};
ws.onerror = () => appendLine("system", "WebSocket error.", new Date().toISOString());
ws.onmessage = async (event) => {
try {
const msg = JSON.parse(event.data);
if (msg.type === "rtc-answer") {
await applyRtcAnswer(msg);
return;
}
if (msg.type === "rtc-ice-candidate") {
await applyRtcIceCandidate(msg);
return;
}
if (msg.type === "rtc-state") {
const state = (msg.state || "").toString();
if (state) {
if (state === "connected") {
voiceStatus.textContent = "Voice channel connected. Hold Push-to-Talk to speak.";
} else {
voiceStatus.textContent = `Voice channel ${state}.`;
}
}
return;
}
if (msg.type === "rtc-error") {
const text = (msg.message || "Unknown WebRTC error.").toString();
voiceStatus.textContent = `Voice error: ${text}`;
appendLine("system", `Voice error: ${text}`, new Date().toISOString());
await stopVoiceChannel("Voice channel error.");
scheduleReconnect("Voice channel error.");
return;
}
appendLine(msg.role || "system", msg.text || "", msg.timestamp || "");
} catch (_err) {
appendLine("system", event.data, new Date().toISOString());
}
};
spawnBtn.onclick = () => sendJson({ type: "spawn" });
stopBtn.onclick = () => sendJson({ type: "stop" });
pttBtn.onpointerdown = (event) => {
if (event.button !== 0) return;
if (pttBtn.setPointerCapture) {
pttBtn.setPointerCapture(event.pointerId);
}
beginPushToTalk(event);
};
pttBtn.onpointerup = (event) => endPushToTalk(event);
pttBtn.onpointercancel = (event) => endPushToTalk(event);
pttBtn.onlostpointercapture = (event) => endPushToTalk(event);
pttBtn.addEventListener("keydown", (event) => {
const isSpace = event.code === "Space" || event.key === " ";
if (!isSpace || event.repeat) return;
beginPushToTalk(event);
});
pttBtn.addEventListener("keyup", (event) => {
const isSpace = event.code === "Space" || event.key === " ";
if (!isSpace) return;
endPushToTalk(event);
});
recordBtn.onclick = async () => {
if (voiceConnected || peerConnection || connectingVoice) {
await stopVoiceChannel("Voice channel disconnected.", true);
return;
}
voiceDesired = true;
reconnectAttempts = 0;
await connectVoiceChannel();
};
document.body.addEventListener("click", () => {
if (remoteAudio.srcObject && remoteAudio.paused) {
remoteAudio.play().catch(() => {});
}
});
setVoiceState(false);
</script>
</body>
</html>

388
supertonic_gateway.py Normal file
View file

@ -0,0 +1,388 @@
import asyncio
import contextlib
import os
import pty
import re
import shlex
import signal
import subprocess
import time
from collections import deque
from pathlib import Path
from wisper import WisperBus, WisperEvent
ANSI_ESCAPE_RE = re.compile(r"\x1B\[[0-?]*[ -/]*[@-~]")
CONTROL_CHAR_RE = re.compile(r"[\x00-\x08\x0b-\x1f\x7f]")
BRAILLE_SPINNER_RE = re.compile(r"[\u2800-\u28ff]")
SPINNER_ONLY_RE = re.compile(r"^[\s|/\\\-]+$")
BOX_DRAWING_ONLY_RE = re.compile(r"^[\s\u2500-\u257f]+$")
THINKING_LINE_RE = re.compile(r"^(?:agent|nanobot)\s+is\s+thinking\b", re.IGNORECASE)
TOOL_STREAM_LINE_RE = re.compile(
r"^(?:tool(?:\s+call|\s+output)?|calling\s+tool|running\s+tool|executing\s+tool)\b",
re.IGNORECASE,
)
EMOJI_RE = re.compile(
"[" # Common emoji and pictograph blocks.
"\U0001F1E6-\U0001F1FF"
"\U0001F300-\U0001F5FF"
"\U0001F600-\U0001F64F"
"\U0001F680-\U0001F6FF"
"\U0001F700-\U0001F77F"
"\U0001F780-\U0001F7FF"
"\U0001F800-\U0001F8FF"
"\U0001F900-\U0001F9FF"
"\U0001FA00-\U0001FAFF"
"\u2600-\u26FF"
"\u2700-\u27BF"
"\uFE0F"
"\u200D"
"]"
)
def _clean_output(text: str) -> str:
cleaned = ANSI_ESCAPE_RE.sub("", text)
cleaned = BRAILLE_SPINNER_RE.sub(" ", cleaned)
cleaned = CONTROL_CHAR_RE.sub("", cleaned)
return cleaned.replace("\r", "\n")
def _resolve_nanobot_command_and_workdir() -> tuple[str, Path]:
command_override = os.getenv("NANOBOT_COMMAND")
workdir_override = os.getenv("NANOBOT_WORKDIR")
if workdir_override:
default_workdir = Path(workdir_override).expanduser()
else:
default_workdir = Path.home()
if command_override:
return command_override, default_workdir
nanobot_dir = Path.home() / "nanobot"
nanobot_python_candidates = [
nanobot_dir / ".venv" / "bin" / "python",
nanobot_dir / "venv" / "bin" / "python",
]
for nanobot_venv_python in nanobot_python_candidates:
if nanobot_venv_python.exists():
if not workdir_override:
default_workdir = nanobot_dir
return f"{nanobot_venv_python} -m nanobot agent --no-markdown", default_workdir
return "nanobot agent --no-markdown", default_workdir
def _infer_venv_root(command_parts: list[str], workdir: Path) -> Path | None:
if not command_parts:
return None
binary = Path(command_parts[0]).expanduser()
if binary.is_absolute() and binary.name.startswith("python") and binary.parent.name == "bin":
return binary.parent.parent
for candidate in (workdir / ".venv", workdir / "venv"):
if (candidate / "bin" / "python").exists():
return candidate
return None
def _build_process_env(command_parts: list[str], workdir: Path) -> tuple[dict[str, str], Path | None]:
env = os.environ.copy()
env.pop("PYTHONHOME", None)
venv_root = _infer_venv_root(command_parts, workdir)
if not venv_root:
return env, None
venv_bin = str(venv_root / "bin")
path_entries = [entry for entry in env.get("PATH", "").split(os.pathsep) if entry]
path_entries = [entry for entry in path_entries if entry != venv_bin]
path_entries.insert(0, venv_bin)
env["PATH"] = os.pathsep.join(path_entries)
env["VIRTUAL_ENV"] = str(venv_root)
return env, venv_root
class NanobotTUIProcess:
def __init__(self, bus: WisperBus, command: str, workdir: Path) -> None:
self._bus = bus
self._command = command
self._workdir = workdir
self._process: subprocess.Popen[bytes] | None = None
self._master_fd: int | None = None
self._read_task: asyncio.Task[None] | None = None
self._pending_output = ""
self._suppress_noisy_ui = os.getenv("NANOBOT_SUPPRESS_NOISY_UI", "1").strip() not in {
"0",
"false",
"False",
"no",
"off",
}
self._dedup_window_s = max(0.2, float(os.getenv("NANOBOT_OUTPUT_DEDUP_WINDOW_S", "1.5")))
self._recent_lines: deque[tuple[str, float]] = deque()
self._last_tts_line = ""
@property
def running(self) -> bool:
return self._process is not None and self._process.poll() is None
async def start(self) -> None:
if self.running:
await self._bus.publish(WisperEvent(role="system", text="Nanobot TUI is already running."))
return
command_parts = [
os.path.expandvars(os.path.expanduser(part)) for part in shlex.split(self._command)
]
if not command_parts:
await self._bus.publish(WisperEvent(role="system", text="NANOBOT_COMMAND is empty."))
return
if not self._workdir.exists():
await self._bus.publish(
WisperEvent(
role="system",
text=f"NANOBOT_WORKDIR does not exist: {self._workdir}",
)
)
return
master_fd, slave_fd = pty.openpty()
child_env, child_venv_root = _build_process_env(command_parts=command_parts, workdir=self._workdir)
try:
self._process = subprocess.Popen(
command_parts,
stdin=slave_fd,
stdout=slave_fd,
stderr=slave_fd,
cwd=str(self._workdir),
start_new_session=True,
env=child_env,
)
except FileNotFoundError as exc:
os.close(master_fd)
os.close(slave_fd)
await self._bus.publish(
WisperEvent(
role="system",
text=(
"Could not start Nanobot process "
f"(command='{command_parts[0]}', workdir='{self._workdir}'): {exc}. "
"Check NANOBOT_COMMAND and NANOBOT_WORKDIR."
),
)
)
return
except Exception as exc:
os.close(master_fd)
os.close(slave_fd)
await self._bus.publish(
WisperEvent(role="system", text=f"Failed to spawn TUI process: {exc}")
)
return
os.close(slave_fd)
os.set_blocking(master_fd, False)
self._master_fd = master_fd
self._read_task = asyncio.create_task(self._read_output(), name="nanobot-tui-reader")
await self._bus.publish(
WisperEvent(
role="system",
text=f"Spawned Nanobot TUI with command: {' '.join(command_parts)}",
)
)
if child_venv_root:
await self._bus.publish(
WisperEvent(
role="system",
text=f"Nanobot runtime venv: {child_venv_root}",
)
)
async def send(self, text: str) -> None:
if not self.running or self._master_fd is None:
await self._bus.publish(
WisperEvent(role="system", text="Nanobot TUI is not running. Click spawn first.")
)
return
message = text.rstrip("\n") + "\n"
try:
os.write(self._master_fd, message.encode())
except OSError as exc:
await self._bus.publish(WisperEvent(role="system", text=f"Failed to write to TUI: {exc}"))
async def stop(self) -> None:
if self._read_task:
self._read_task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await self._read_task
self._read_task = None
if self.running and self._process:
try:
os.killpg(self._process.pid, signal.SIGTERM)
except ProcessLookupError:
pass
except Exception:
self._process.terminate()
try:
self._process.wait(timeout=3)
except Exception:
self._process.kill()
self._process.wait(timeout=1)
if self._master_fd is not None:
try:
os.close(self._master_fd)
except OSError:
pass
self._master_fd = None
self._process = None
self._pending_output = ""
self._recent_lines.clear()
self._last_tts_line = ""
await self._bus.publish(WisperEvent(role="system", text="Stopped Nanobot TUI."))
async def _read_output(self) -> None:
if self._master_fd is None:
return
while self.running:
try:
chunk = os.read(self._master_fd, 4096)
except BlockingIOError:
await asyncio.sleep(0.05)
continue
except OSError:
break
if not chunk:
await asyncio.sleep(0.05)
continue
text = _clean_output(chunk.decode(errors="ignore"))
if not text.strip():
continue
displayable, tts_publishable = self._consume_output_chunk(text)
if displayable:
await self._bus.publish(WisperEvent(role="nanobot", text=displayable))
if tts_publishable:
await self._bus.publish(WisperEvent(role="nanobot-tts", text=tts_publishable))
trailing_display, trailing_tts = self._consume_output_chunk("\n")
if trailing_display:
await self._bus.publish(WisperEvent(role="nanobot", text=trailing_display))
if trailing_tts:
await self._bus.publish(WisperEvent(role="nanobot-tts", text=trailing_tts))
if self._process is not None:
exit_code = self._process.poll()
await self._bus.publish(
WisperEvent(role="system", text=f"Nanobot TUI exited (code={exit_code}).")
)
def _consume_output_chunk(self, text: str) -> tuple[str, str]:
self._pending_output += text
lines = self._pending_output.split("\n")
self._pending_output = lines.pop()
if len(self._pending_output) > 1024:
lines.append(self._pending_output)
self._pending_output = ""
kept_lines: list[str] = []
tts_lines: list[str] = []
for line in lines:
normalized = self._normalize_line(line)
if not normalized:
continue
if self._suppress_noisy_ui and self._is_noisy_ui_line(normalized):
continue
if normalized != self._last_tts_line:
tts_lines.append(normalized)
self._last_tts_line = normalized
if self._is_recent_duplicate(normalized):
continue
kept_lines.append(normalized)
return "\n".join(kept_lines).strip(), "\n".join(tts_lines).strip()
def _normalize_line(self, line: str) -> str:
without_emoji = EMOJI_RE.sub(" ", line)
return re.sub(r"\s+", " ", without_emoji).strip()
def _is_noisy_ui_line(self, line: str) -> bool:
if SPINNER_ONLY_RE.fullmatch(line):
return True
if BOX_DRAWING_ONLY_RE.fullmatch(line):
return True
candidate = re.sub(r"^[^\w]+", "", line)
if THINKING_LINE_RE.match(candidate):
return True
if TOOL_STREAM_LINE_RE.match(candidate):
return True
return False
def _is_recent_duplicate(self, line: str) -> bool:
now = time.monotonic()
normalized = line.lower()
while self._recent_lines and (now - self._recent_lines[0][1]) > self._dedup_window_s:
self._recent_lines.popleft()
for previous, _timestamp in self._recent_lines:
if previous == normalized:
return True
self._recent_lines.append((normalized, now))
return False
class SuperTonicGateway:
def __init__(self) -> None:
self.bus = WisperBus()
self._lock = asyncio.Lock()
self._tui: NanobotTUIProcess | None = None
async def subscribe(self) -> asyncio.Queue[WisperEvent]:
return await self.bus.subscribe()
async def unsubscribe(self, queue: asyncio.Queue[WisperEvent]) -> None:
await self.bus.unsubscribe(queue)
async def spawn_tui(self) -> None:
async with self._lock:
if self._tui and self._tui.running:
await self.bus.publish(WisperEvent(role="system", text="Nanobot TUI is already running."))
return
command, workdir = _resolve_nanobot_command_and_workdir()
self._tui = NanobotTUIProcess(bus=self.bus, command=command, workdir=workdir)
await self._tui.start()
async def send_user_message(self, text: str) -> None:
message = text.strip()
if not message:
return
await self.bus.publish(WisperEvent(role="user", text=message))
async with self._lock:
if not self._tui:
await self.bus.publish(
WisperEvent(role="system", text="Nanobot TUI is not running. Click spawn first.")
)
return
await self._tui.send(message)
async def stop_tui(self) -> None:
async with self._lock:
if self._tui:
await self._tui.stop()
async def shutdown(self) -> None:
await self.stop_tui()

1656
voice_rtc.py Normal file

File diff suppressed because it is too large Load diff

37
wisper.py Normal file
View file

@ -0,0 +1,37 @@
import asyncio
from dataclasses import dataclass, field
from datetime import datetime, timezone
@dataclass(slots=True)
class WisperEvent:
role: str
text: str
timestamp: str = field(
default_factory=lambda: datetime.now(timezone.utc).isoformat(timespec="seconds")
)
def to_dict(self) -> dict[str, str]:
return {"role": self.role, "text": self.text, "timestamp": self.timestamp}
class WisperBus:
def __init__(self) -> None:
self._subscribers: set[asyncio.Queue[WisperEvent]] = set()
self._lock = asyncio.Lock()
async def subscribe(self) -> asyncio.Queue[WisperEvent]:
queue: asyncio.Queue[WisperEvent] = asyncio.Queue()
async with self._lock:
self._subscribers.add(queue)
return queue
async def unsubscribe(self, queue: asyncio.Queue[WisperEvent]) -> None:
async with self._lock:
self._subscribers.discard(queue)
async def publish(self, event: WisperEvent) -> None:
async with self._lock:
subscribers = list(self._subscribers)
for queue in subscribers:
queue.put_nowait(event)