Switch wake word from Porcupine to openwakeword + training pipeline

- Add training/ pipeline (step_1..step_5) and own-samples flow - record_wav.py with single-shot and long-record modes, RMS-based silence filter - remove_silent.py to drop silent samples and renumber - modes.py: openwakeword inference with reset() and quiet predictions; commented Lusya block for later - stt.py: drop local faster-whisper fallback, Groq-only - config.py: remove unused STT_PROVIDER/WHISPER_* - llm.py: replace __import__("os") hack with proper import - tts.py: remove debug traceback in play_error_sound - requirements.txt: add openwakeword/sounddevice/scipy, drop faster-whisper - deploy/setup.sh: validate ELEVENLABS_API_KEY and WAKE_WORD_COSMO presence - README.md, CLAUDE.md, project_roadmap memory updated to reflect new architecture
2026-04-13 15:40:44 +03:00
parent 0a89bf5105
commit 780f6f0084
13 changed files with 378 additions and 140 deletions
--- a/remove_silent.py
+++ b/remove_silent.py
@@ -0,0 +1,20 @@
+import wave
+from pathlib import Path
+import numpy as np
+
+for sub, t in [('positive', 250), ('negative', 200)]:
+    d = Path(f'training/own_samples/cosmo/{sub}')
+    removed = 0
+    for f in sorted(d.glob('*.wav')):
+        with wave.open(str(f)) as w:
+            data = np.frombuffer(w.readframes(w.getnframes()), dtype=np.int16)
+        if np.sqrt(np.mean(data.astype(np.float64)**2)) < t:
+            f.unlink(); removed += 1
+
+    files = sorted(d.glob('*.wav'))
+    for i, f in enumerate(files, 1):
+        f.rename(d / f'_tmp_{i:03d}.wav')
+    for i, f in enumerate(sorted(d.glob('_tmp_*.wav')), 1):
+        f.rename(d / f'{i:03d}.wav')
+
+    print(f'{sub}: removed {removed}, renumbered → 001..{len(files):03d}.wav')