Android Chrome требует user-gesture для <audio>.play(). Wake-word
триггерит TTS «сам», без тапа, поэтому play() тихо отвергался.
При тапе на кнопку микрофона теперь проигрываем 1мс silent WAV →
браузер помечает страницу как разрешённую для autoplay в текущей
сессии. Дальше TTS-ответы Cosmo/Lusya играют без проблем.
В VoiceOverlay логируем причину если play() всё ещё отвергнут.
В overlay появляется крестик в правом верхнем углу. Тап = эмитит
voice-cancel → VoiceController прерывает активный VAD-захват и сам
overlay закрывается. Wake-word, если был активен, продолжает слушать
в фоне.
Шаг 2 миграции: убираем зависимость от Python-агента для базового
голосового сценария. Тап на круглую кнопку-микрофон в правом нижнем
углу → MicVAD (Silero v5) ловит речь → автостоп по тишине → /api/voice/stt
→ /api/voice/chat → ответ через SSE и TTS как раньше.
- components/VoiceController.tsx — push-to-talk UI + MicVAD orchestration
- VoiceOverlay теперь слушает window CustomEvent('voice-local'), чтобы
орб моргал ещё до round-trip на сервер (wake/listening мгновенно).
- public/vad/ — silero v5/legacy onnx + ort wasm + audio worklet,
раздаются через baseAssetPath: '/vad/' (не зависит от внешнего CDN,
важно если планшет без интернета или с RU-блоком).
Что осталось от home-voice-assistant: только wake-word. После Шага 3
(onnxruntime-web + перенос openwakeword .onnx) Python-агент уйдёт целиком.
Two fixes:
1) Overlay was hiding mid-TTS because dismiss timer used
text.length * 80ms — ElevenLabs speaks slower, so the audio got
cut off. Now scheduleDismiss is only called from playTTS's
onEnded callback (plus 4s lingering after audio finishes).
2) After response, the Python script silently re-entered record()
for follow-ups but the overlay disappeared, so the user had to
re-wake every turn. Added a new 'listening' event — Python
emits it just before each followup record(), tablet shows the
orb pulsing at medium intensity with 'жду' status and the last
response text preserved below.
Safety: any state now arms a 60s auto-close in case Python dies
and never emits idle.
Adds the infrastructure for Claude tool use + visual timer.
Tablet API surface (all bearer-authed with VOICE_API_KEY, middleware bypassed):
- /api/voice/tools/weather — current + short forecast via Open-Meteo
- /api/voice/tools/transport — tram arrivals by direction / route filter
- /api/voice/tools/events — Google Calendar today/week
- /api/voice/tools/notes — notes + shopping lists
- /api/voice/timer — start (with seconds+label), cancel; GET list (cookie ok)
Active timers persisted at /data/tablet-timers.json
UI:
- VoiceOverlay stripped to minimal Siri look: no agent emoji/name, just the
pulsing orb (3-layer radial gradient, independent breath animations),
subtle status label on wake only, transcription/response text centered.
Agents distinguished by orb color (Cosmo indigo/violet, Люся pink).
- TimerWidget: bottom-right chip stack with countdown, progress bar, turns
amber in last 10s. On expiry, fires fullscreen alarm overlay with beep
(WebAudio osc) + Остановить button.
Other:
- lib/timers.ts — persistent timer store in /data
- lib/voice-tools.ts — shared bearer-auth helper
- middleware — bypass list now covers /api/voice/tools/* and /api/voice/timer
Stage 2 of voice integration — centralizes TTS on the tablet so the
Python satellite no longer needs ElevenLabs credentials or mpv.
- app/api/voice/tts — POST {text, agent}, proxies to ElevenLabs
streaming endpoint with flash_v2_5 default, returns audio/mpeg.
Per-agent voice id via COSMO_TTS_VOICE / LUSYA_TTS_VOICE env.
- VoiceOverlay — on response/error events fetches TTS and plays via
HTMLAudioElement; on wake event stops playback (barge-in). Dismiss
timer extended by text length so long responses do not cut off.
- Autoplay caveat: browser may block first playback until user taps
anywhere on the page (FKB: enable Force Autoplay to bypass).
Adds the tablet side of voice assistant integration. External Python
script (openWakeWord + Groq STT + OpenClaw) will POST state transitions
to /api/voice/event with a bearer token, and the tablet shows a
fullscreen overlay with Siri-style animated blob + current agent +
recognized text / response text.
- lib/voice-bus.ts — in-process EventEmitter singleton, preserved
across hot reloads via globalThis
- app/api/voice/event — POST, bearer-auth via VOICE_API_KEY env,
validates event kind, broadcasts on voiceBus
- app/api/voice/stream — GET, SSE endpoint, per-connection listener
with 15s keep-alive ping and abort-signal cleanup
- components/VoiceOverlay — full-screen overlay, 3-layer pulsing
Siri blob, per-agent palette (cosmo indigo/violet, lusya pink/rose),
auto-dismiss timeouts (wake=20s safety, response=6s, error=4s),
auto-reconnect on SSE drop
- middleware bypasses /api/voice/event so the script does not need
a user auth cookie
- VoiceOverlay mounted in HomePageInner outside tab routing so it
appears on every view