Section 01
R-C2: Breaking the Bottleneck of Multimodal Reasoning with Cross-Modal Cycle-Consistent Reinforcement Learning
Rutgers University and other institutions proposed the R-C2 framework, which converts cross-modal inconsistencies in multimodal models into self-supervised learning signals. Through cycle consistency constraints, it achieves improved reasoning capabilities without manual annotation, gaining up to 7.6 percentage points in performance across multiple benchmark tests, providing a new path to address the "modality gap" dilemma in multimodal reasoning.