Section 01
Introduction: MEDS — A Multimodal Solution to Bridge the Emotional Gap in Voice Interaction
MEDS is an innovative multimodal emotion detection system. By integrating speech-to-text (Whisper) and audio feature extraction (Librosa) technologies, combined with the Oumi small language model, it identifies discrepancies between users' utterances and their true emotions, solving the 'emotional gap' problem where AI voice assistants fail to perceive real emotions. It features privacy-first design and low latency, bringing emotional understanding capabilities to voice interactions.