Section 01
AuRA: Internalize Audio Understanding Capabilities into LoRA, Enabling Large Language Models to Truly Understand Speech
Source Information
- Original Author Team: Paper author team (arXiv:2606.11033v1)
- Source Platform: arXiv
- Publication Date: June 9, 2026
- Original Link: http://arxiv.org/abs/2606.11033v1
Core Insights
AuRA transfers the audio understanding capabilities of ASR encoders to LoRA-adapted Large Language Models (LLMs) via knowledge distillation, enabling end-to-end speech understanding. This method significantly improves multimodal performance while maintaining efficient inference, and has advantages such as parameter efficiency and reuse of pre-trained assets.