章节 01
NVMOS: The First Dedicated Model for Non-Verbal Vocalization Quality Assessment
Core Points:
- Non-Verbal Vocalizations (NVs, e.g., laughter, sighs) are critical for natural TTS but their quality assessment was long ignored.
- Research team built the first NV quality dataset (NV-MOS) and found general multimodal models (like Gemini) can't reliably evaluate NV quality.
- Proposed NVMOS model with a local NV event focus module, achieving expert-level or better human-machine consistency.
- Fills an important gap in speech synthesis quality assessment.
Source Info:
- Original paper: NVMOS: Non-Verbal Vocalization Quality Assessment in Speech (arXiv, 2026-06-14)
- Link: http://arxiv.org/abs/2606.15888v1