Section 01
Main Floor: MSAO—A New Paradigm for Edge-Cloud Collaborative Optimization of Multimodal Large Model Inference
MSAO proposes an adaptive offloading framework based on modality sparsity awareness. It quantifies the necessity of each modality using the MAS metric and achieves dynamic edge-cloud collaboration with speculative execution. This reduces latency by 30% while increasing throughput by 1.5-2.3 times, solving the problems of high resource consumption and long inference latency when deploying Multimodal Large Language Models (MLLMs) on edge devices.