Section 01
[Introduction] C2RoPE: A New Method to Enhance Spatial Reasoning Capability of 3D Multimodal Models
This article introduces the C2RoPE (Causal Continuous Rotary Position Encoding) technology, which aims to address the challenges of 3D multimodal models in modeling spatial position relationships. By improving the position encoding mechanism, it enhances the model's spatial understanding capability and provides new ideas for the application of vision-language models in 3D scenarios. C2RoPE introduces a causal continuous design to simulate human attention allocation and dynamically adjust encoding weights. Experiments show that its accuracy in spatial relationship understanding in 3D visual question answering tasks is improved by more than 15%.