Section 01
[Introduction] Multimodal Large Language Models Reshape the Paradigm of Image and Video Segmentation Technology
Based on the Awesome-MLLM-Segmentation repository, this article summarizes over 30 cutting-edge studies from top conferences/journals between 2023 and 2025, covering core directions such as referring expression segmentation, open-vocabulary semantic segmentation, video segmentation, and reasoning segmentation. It reveals how Multimodal Large Language Models (MLLMs) are reshaping pixel-level understanding of images and videos, and also includes applications in vertical fields like remote sensing and prospects for technical trends.