Section 01
Introduction to the LVRPO Framework: A New GRPO-Based Language-Visual Alignment Method
This article introduces the LVRPO (Language-Visual Reinforcement-based Preference Optimization) framework, a reinforcement learning-based language-visual preference optimization method. Its core innovation lies in directly optimizing multimodal model behavior via Group Relative Policy Optimization (GRPO), without the need for auxiliary encoders or hand-designed cross-modal objectives. It outperforms strong unified pre-training baselines on both multimodal understanding and generation tasks.