Section 01
R3: Guide to Research on Optimization Dilemmas Between Understanding and Generation Tasks in Multimodal Models
R3 is the code implementation of a paper accepted by ICLR 2026, focusing on the optimization dilemmas between understanding and generation tasks in multimodal models. The study reveals its causes include inherent conflicts in task objectives, competition for attention mechanisms, and differences in training data distribution, and proposes solutions such as task-aware routing mechanisms, gradient coordination techniques, and progressive training strategies. Experimental verification shows that these strategies effectively balance the two capabilities, and the code has been open-sourced, which provides important insights for industry development.