Section 01
OmniThoughtVis: A Large-Scale Distillation Framework to Address Deployment Challenges of Multimodal Reasoning Models
OmniThoughtVis is an extensible data curation and knowledge distillation pipeline. Its core goal is to bridge the gap between large models (which have strong reasoning capabilities but are hard to deploy) and small models (which are easy to deploy but lack high-quality multimodal chain-of-thought data). Through structured chain-of-thought generation, difficulty-aware selection, and label diversity sampling, this framework constructs a high-quality multimodal reasoning dataset with 1.8 million samples, successfully transferring the reasoning capabilities of large models to small models with 2B-8B parameters, and providing a feasible path for the practical deployment of multimodal reasoning models.