Section 01
【Introduction】CORA: A New Method to Resolve Thinking-Answer Discrepancy in Multimodal RLVR
Basic Information about CORA Research
- Original Authors/Maintainers: Paper author team (arxiv:2606.14691v1)
- Source Platform: arXiv
- Original Title: CORA: Analyzing and bridging thinking-answer gap in Multimodal RLVR via Consistency-Oriented Reasoning Alignment
- Original Link: http://arxiv.org/abs/2606.14691v1
- Release Time: 2026-06-12
Core Insights
This paper proposes the CORA (Consistency-Oriented Reasoning Alignment) method, which addresses the discrepancy between the thinking process and final answer of large vision-language models (LVLMs) in multimodal reinforcement learning with verifiable rewards (RLVR) scenarios by introducing a consistency reward model and hybrid reward advantage separation (HRAS) technique, enhancing the credibility of model reasoning and its practical application effects.