Section 01
Introduction: Visual Gradient Steering Breaks the Optimization Bottleneck in Multimodal Distillation
Core Insight: Researchers found that the gradients of language priors and visual localization are almost orthogonal in vision-language model distillation, and proposed the Visual Gradient Steering (VGS) method to dynamically adjust the optimization direction, significantly improving the visual reasoning ability of small models.
Basic Information:
- Original author team: Hee Suk Yoon, Eunseop Yoon, Jaehyun Jang, SooHwan Eom, Ji Woo Hong, Mark Hasegawa-Johnson, Qi Dai, Chong Luo, Chang D. Yoo
- Source: arXiv (accepted as ICML 2026 Spotlight)
- Original paper link: http://arxiv.org/abs/2606.00564v1
- Publication date: May 30, 2026
- Code open source: https://github.com/hee-suk-yoon/Decomposed_OPD