Section 01
AudioMCQ: A New Milestone in Advancing Post-Training for Large Audio Language Models
AudioMCQ is a large-scale multiple-choice question dataset designed specifically for the post-training of Large Audio Language Models (LALMs), containing 571,000 samples. Its core innovations include a dual-chain thinking annotation mechanism and an audio contribution filtering framework, which effectively address the problem of models over-relying on text priors. The dataset won first place in the DCASE 2025 Challenge, filling the gap in audio contribution-aware datasets and advancing the development of audio language models.