Section 01
GAP-MLLM Project Introduction: Activating 3D Spatial Perception Capabilities of Multimodal Large Models
GAP-MLLM proposes a novel geometry-aligned pre-training method aimed at enhancing the 3D spatial perception and understanding capabilities of multimodal large language models, bridging the gap between 2D vision and 3D geometry.
Original Author/Maintainer: ZestfulJX Source Platform: GitHub Original Title: GAP-MLLM Original Link: https://github.com/ZestfulJX/GAP-MLLM Source Publication/Update Time: 2026-05-28T06:42:55Z