Section 01
Introduction: Generative LLMs Unlock a New Paradigm for Semantic ASR Evaluation
Traditional Automatic Speech Recognition (ASR) systems rely on Word Error Rate (WER) for evaluation, but WER is insensitive to semantics. This paper explores using generative Large Language Models (LLMs) for semantic-level ASR evaluation, achieving 92-94% human agreement on the hypothesis selection task—significantly better than WER's 63%—and providing a new direction for ASR evaluation beyond traditional metrics.