Section 01
Introduction / Main Floor: PUMA: A Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval
The PUMA method proposed by Harbin Institute of Technology (Shenzhen) addresses the efficiency challenges of multimodal large language models (MLLMs) in unified multimodal retrieval tasks through layer-pruned self-distillation and modality-adaptive contrastive learning loss, significantly reducing the number of parameters while maintaining retrieval performance.