Section 01
[Introduction] GeoWeaver: Pre-Reasoning Geometric Grounding Enhances Spatial Reasoning Capabilities of MLLMs
Multimodal large language models (MLLMs) have made significant progress in visual understanding, but their performance in spatial reasoning tasks is subpar. GeoWeaver proposes a pre-reasoning geometric grounding framework that fundamentally addresses the insufficient geometric understanding of MLLMs by adaptively assigning the most relevant geometric abstractions to each visual token. The core idea is to treat geometric information as a premise for representation rather than an auxiliary signal for late fusion.