Section 01
[Main Floor] Faithful GRPO: A New Method to Enhance Visual Spatial Reasoning Credibility of Multimodal Models
This article introduces Faithful GRPO (FGRPO), a constrained policy optimization method addressing the credibility issue of visual spatial reasoning in multimodal models. Current multimodal reasoning models face problems such as logical inconsistency between the chain of thought and the answer, and lack of faithful reference to visual evidence in reasoning. FGRPO enforces logical consistency and visual grounding constraints via Lagrangian dual ascent, reducing the reasoning inconsistency rate from 24.5% to 1.7% while improving answer accuracy.