Mac M1 optimizations, fix train pipeline, add Hey Cosmo wake word model

- Fix install_mac.sh: use venv + Python 3.12 (3.14 incompatible with ML libs) - Fix run_mac.sh: activate venv, add CPU thread optimization env vars - Fix agent.py: remove f-string from SYSTEM_PROMPT template (NameError on import) - Add missing deps: sounddevice, pydub, imageio-ffmpeg, omegaconf - Optimize for M1: torch.inference_mode, set_num_threads, OMP/MKL tuning - Switch to qwen2.5:3b for faster LLM responses on Mac - Switch Whisper to medium model with auto compute (small+int8 had poor Russian) - Add initial_prompt for better Russian transcription - Add open_app tool for native macOS app launching - Fix TTS: sanitize Latin text to Cyrillic for Silero compatibility - Fix wake word echo: add cooldown after TTS, reset model state, raise threshold - Make "Слушаю" TTS synchronous to avoid mic interference - Fix train Dockerfile: remove tensorflow/onnx2tf (only ONNX needed), fix deps - Fix train.sh: use wget for dataset download, add --shm-size=2g - Add trained hey_cosmo.onnx wake word model - Add TODO section to CLAUDE.md (ChatterBox TTS, Ollama Modelfile ideas) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:19:53 +03:00
parent 6010816f1d
commit 110d9cde29
15 changed files with 183 additions and 94 deletions
--- a/cosmo/tts.py
+++ b/cosmo/tts.py
@@ -32,6 +32,10 @@ class TTS:
            self.enabled = False
            return

+        # Оптимизация CPU-инференса на Apple Silicon
+        num_threads = config.get("performance", {}).get("num_threads", 4)
+        torch.set_num_threads(num_threads)
+
        self._load_model()

    def _load_model(self):
@@ -52,16 +56,33 @@ class TTS:
            logger.warning("TTS отключён")
            self.enabled = False

+    @staticmethod
+    def _sanitize_text(text: str) -> str:
+        """Заменяет латиницу на читаемый русский для TTS."""
+        import re
+        # Транслитерация частых англ. слов которые Silero не прочитает
+        text = re.sub(r'[Ss]afari', 'Сафари', text)
+        text = re.sub(r'[Cc]hrome', 'Хром', text)
+        text = re.sub(r'[Tt]elegram', 'Телеграм', text)
+        text = re.sub(r'[Ww]eb[Ss]torm', 'ВебШторм', text)
+        text = re.sub(r'[Vv][Ss]\s?[Cc]ode', 'ВиЭс Код', text)
+        # Оставшиеся латинские слова — убираем, чтобы Silero не зависал
+        text = re.sub(r'[A-Za-z]+', '', text)
+        # Убираем лишние пробелы
+        text = re.sub(r'\s+', ' ', text).strip()
+        return text if text else "Готово"
+
    def say(self, text: str):
        """Произнести текст синхронно."""
        if not self.enabled or self._model is None:
            logger.info(f"[TTS]: {text}")
            return

+        text = self._sanitize_text(text)
        logger.debug(f"TTS: '{text}'")
        with self._lock:
            try:
-                with torch.no_grad():
+                with torch.inference_mode():
                    audio = self._model.apply_tts(
                        text=text,
                        speaker=self.speaker,