Section 01
SteerMoE: Introduction to the New Paradigm for Efficient Audio-Language Model Alignment Under Frozen Backbone
SteerMoE achieves efficient bridging between audio encoders and language decoders by using a lightweight (only 1.8M parameters) Mixture of Experts (MoE) alignment module, with both components completely frozen. This paradigm addresses the issues of catastrophic forgetting, high training costs, and deployment risks caused by traditional full-parameter fine-tuning, while preserving the original reasoning capabilities of the language model, resulting in excellent performance and extremely high training efficiency.