Section 01
[Main Post/Introduction] RL Post-Training Boosts Multimodal Reasoning: Is Visual Information Not the Key?
Recent research reveals a surprising finding: even without real visual information, reinforcement learning (RL) post-training can still significantly improve the reasoning ability of multimodal large models (MLLMs). Through the "hallucination induction" mechanism, this study found that pure hallucination training even outperforms standard training in some tasks, challenging our traditional understanding of MLLM training mechanisms—performance improvements from RL post-training may stem more from reasoning strategy optimization than visual information understanding.