Section 01
导读 / 主楼:VAPO and SlideASR-Bench: An End-to-End Slide Speech Recognition Solution to Address Visual Interference in Multimodal Large Models
Introduction / Main Floor: VAPO and SlideASR-Bench: An End-to-End Slide Speech Recognition Solution to Address Visual Interference in Multimodal Large Models
The ACL 2026 main conference paper VAPO proposes a visual anchoring strategy optimization method, which solves the visual interference problem of multimodal large language models in slide speech recognition through the "look first, listen later" reasoning chain, and open-sources the SlideASR-Bench benchmark dataset.