Section 01
[Introduction] MEMLENS: A New Benchmark for Evaluating Multimodal Long-Context Dialogue Memory of VLMs
MEMLENS is a new benchmark specifically for evaluating the memory retention capabilities of vision-language models (VLMs) in long-context multimodal dialogues, filling the gap in the current evaluation system in this field. It constructs a structured evaluation framework to help developers and researchers understand the memory characteristics of models, promoting the evolution of VLMs from tools to intelligent partners.
Keywords: Vision-Language Models, Multimodal Memory, Long Context, Benchmark, MEMLENS