Files
home-voice-assistant/remove_silent.py
Daniil Klimov 780f6f0084 Switch wake word from Porcupine to openwakeword + training pipeline
- Add training/ pipeline (step_1..step_5) and own-samples flow
- record_wav.py with single-shot and long-record modes, RMS-based silence filter
- remove_silent.py to drop silent samples and renumber
- modes.py: openwakeword inference with reset() and quiet predictions; commented Lusya block for later
- stt.py: drop local faster-whisper fallback, Groq-only
- config.py: remove unused STT_PROVIDER/WHISPER_*
- llm.py: replace __import__("os") hack with proper import
- tts.py: remove debug traceback in play_error_sound
- requirements.txt: add openwakeword/sounddevice/scipy, drop faster-whisper
- deploy/setup.sh: validate ELEVENLABS_API_KEY and WAKE_WORD_COSMO presence
- README.md, CLAUDE.md, project_roadmap memory updated to reflect new architecture
2026-04-13 15:40:44 +03:00

20 lines
723 B
Python

import wave
from pathlib import Path
import numpy as np
for sub, t in [('positive', 250), ('negative', 200)]:
d = Path(f'training/own_samples/cosmo/{sub}')
removed = 0
for f in sorted(d.glob('*.wav')):
with wave.open(str(f)) as w:
data = np.frombuffer(w.readframes(w.getnframes()), dtype=np.int16)
if np.sqrt(np.mean(data.astype(np.float64)**2)) < t:
f.unlink(); removed += 1
files = sorted(d.glob('*.wav'))
for i, f in enumerate(files, 1):
f.rename(d / f'_tmp_{i:03d}.wav')
for i, f in enumerate(sorted(d.glob('_tmp_*.wav')), 1):
f.rename(d / f'{i:03d}.wav')
print(f'{sub}: removed {removed}, renumbered → 001..{len(files):03d}.wav')