Section 01
[Introduction] AIR Framework: Adaptive Interleaved Reasoning and Code Collaboration Framework for Multimodal Large Language Models
The AIR (Adaptive Interleaved Reasoning) framework deeply integrates code execution with multimodal understanding through an adaptive interleaved reasoning mechanism, aiming to solve the information integration problem of Multimodal Large Language Models (MLLMs) in complex reasoning tasks and significantly enhance their solving ability. Its core lies in breaking the linear reasoning process, dynamically switching modalities and using code as an intermediate representation, combining adaptive decision-making and code collaboration to provide practical experience for neuro-symbolic fusion.