Section 01
FAM Project Introduction: The Critical Role of Fine-Grained Alignment in Multimodal Embedding Learning
The FAM (Fine-grained Alignment Matters) project was developed by the relevant research team at Tongji University, exploring the impact of fine-grained alignment mechanisms on multimodal embedding learning in large vision-language models. This project improves cross-modal representation quality through MAC (Multimodal Alignment Component) and VEIN (Visual Embedding Integration Network), built on the VLM2Vec framework. It provides a complete PyTorch implementation, with core code open-sourced, offering a reproducible and scalable multimodal learning platform for researchers and developers.