Stage 2 of voice integration — centralizes TTS on the tablet so the
Python satellite no longer needs ElevenLabs credentials or mpv.
- app/api/voice/tts — POST {text, agent}, proxies to ElevenLabs
streaming endpoint with flash_v2_5 default, returns audio/mpeg.
Per-agent voice id via COSMO_TTS_VOICE / LUSYA_TTS_VOICE env.
- VoiceOverlay — on response/error events fetches TTS and plays via
HTMLAudioElement; on wake event stops playback (barge-in). Dismiss
timer extended by text length so long responses do not cut off.
- Autoplay caveat: browser may block first playback until user taps
anywhere on the page (FKB: enable Force Autoplay to bypass).
Adds the tablet side of voice assistant integration. External Python
script (openWakeWord + Groq STT + OpenClaw) will POST state transitions
to /api/voice/event with a bearer token, and the tablet shows a
fullscreen overlay with Siri-style animated blob + current agent +
recognized text / response text.
- lib/voice-bus.ts — in-process EventEmitter singleton, preserved
across hot reloads via globalThis
- app/api/voice/event — POST, bearer-auth via VOICE_API_KEY env,
validates event kind, broadcasts on voiceBus
- app/api/voice/stream — GET, SSE endpoint, per-connection listener
with 15s keep-alive ping and abort-signal cleanup
- components/VoiceOverlay — full-screen overlay, 3-layer pulsing
Siri blob, per-agent palette (cosmo indigo/violet, lusya pink/rose),
auto-dismiss timeouts (wake=20s safety, response=6s, error=4s),
auto-reconnect on SSE drop
- middleware bypasses /api/voice/event so the script does not need
a user auth cookie
- VoiceOverlay mounted in HomePageInner outside tab routing so it
appears on every view