Section 01
[Introduction] SparseUnifiedModel: Research on Sparsity and Efficient Inference Practice in Unified Multimodal Models
This article focuses on sparsity and efficient inference in unified multimodal models. Through training-agnostic pruning methods, it analyzes the differences in compression sensitivity of model components, finding that understanding components can be significantly compressed in generation tasks without seriously affecting performance, while generation components are highly sensitive to compression. Furthermore, it proposes an adaptive scheme based on the Mixture of Experts (MoE) model, achieving the performance of the full model by activating only about half of the parameters, providing a new path for the efficient deployment of unified multimodal models.