Section 01
Hazelnut-Vox: Introduction to the Fully Local STT-LLM-TTS Voice Conversation Agent
Hazelnut-Vox is a fully locally-run interactive voice agent that implements a complete STT-LLM-TTS pipeline, integrating Whisper speech recognition, Ollama large language model, and Coqui TTS speech synthesis. It supports real-time audio spectrum analysis and Polish language interaction. Key advantages include privacy protection via local operation, offline availability with low latency, CUDA acceleration for improved performance, and application value in both educational and privacy-sensitive scenarios.