Section 01
[Main Floor] SVSR Framework: Self-Verification and Self-Rectification Paradigm Reshaping Multimodal Reasoning Reliability
The SVSR (Self-Verification and Self-Rectification) framework explicitly integrates self-verification and self-rectification capabilities into the reasoning process through three-stage training. The semi-online DPO training, combined with high-quality reasoning trajectories filtered by teacher VLMs, enables the model to exhibit excellent performance in both explicit and implicit reasoning scenarios, aiming to address the reliability issues of shallow reasoning in current multimodal models.