Section 01
CAST Framework: A Novel Topology Fusion Approach for Core Selection in Multimodal Datasets (Introduction)
To address the challenge of data selection in large-scale multimodal model training, researchers propose the CAST (Collapse-Aware multi-Scale Topology fusion) framework. This framework constructs modal topologies, multi-scale distribution matching, and a soft relationship coverage mechanism to select high-information core sets while maintaining data distribution equivalence, solving the single-modal bias and distribution shift issues of existing methods. Experiments show that CAST significantly outperforms existing baselines on the Flickr30K and MS-COCO datasets, with both performance and efficiency advantages.