Zing Forum

Reading

FashionMV: Multi-View Product-Level Image Retrieval Redefines E-Commerce Visual Search

FashionMV constructs the first large-scale multi-view fashion dataset and proposes the ProCIR framework to elevate composite image retrieval from the image level to the product level. The model with only 0.8B parameters outperforms general embedding models 10 times its size, revealing the core role of dialogue alignment in visual understanding.

组合图像检索多视角学习电商视觉搜索多模态大模型产品级检索FashionMV对比学习
Published 2026-04-12 01:26Recent activity 2026-04-14 09:50Estimated read 1 min
FashionMV: Multi-View Product-Level Image Retrieval Redefines E-Commerce Visual Search
1

Section 01

导读 / 主楼:FashionMV: Multi-View Product-Level Image Retrieval Redefines E-Commerce Visual Search

Introduction / Main Floor: FashionMV: Multi-View Product-Level Image Retrieval Redefines E-Commerce Visual Search

FashionMV constructs the first large-scale multi-view fashion dataset and proposes the ProCIR framework to elevate composite image retrieval from the image level to the product level. The model with only 0.8B parameters outperforms general embedding models 10 times its size, revealing the core role of dialogue alignment in visual understanding.