Section 01
[Introduction] InteractVLM: A New Paradigm for 3D Interaction Reasoning Based on 2D Vision Models
InteractVLM is a research project accepted by CVPR 2025. Its core is to use existing 2D foundational vision-language models (VLMs) to achieve 3D interaction reasoning without relying on expensive 3D sensors or complex multi-view reconstruction. This method opens up new possibilities for fields such as robotic manipulation and augmented reality. Its innovation lies in unlocking the 3D prior knowledge in 2D models through clever design, reducing data and deployment costs.