Section 01
【Introduction】SIMMER: A Breakthrough New Method for Cross-Modal Retrieval Between Food Images and Recipes
This paper proposes the SIMMER framework, which uses a single multimodal encoder instead of the traditional dual-encoder architecture, achieving a breakthrough in image-to-recipe retrieval R@1 from 81.8% to 87.5% on the Recipe1M dataset. This method addresses issues such as semantic gaps and task-specific design in traditional cross-modal retrieval, providing a new paradigm for cross-modal retrieval between food images and recipe texts.