Section 01
Key Highlights of OpenVLThinkerV2
This article introduces OpenVLThinkerV2, a universal multimodal reasoning model. Its core innovations are the reinforcement learning objective based on Gaussian GRPO (G²RPO) and the accompanying task-level shaping mechanism, which solve the problems of cross-task gradient fairness and perception-reasoning balance. It has achieved surpassing both open-source and closed-source cutting-edge models in 18 benchmark tests.