Section 01
【Main Floor】Persistent Visual Memory: Solving the Visual Signal Dilution Problem of Large Vision-Language Models
Large Vision-Language Models (LVLMs) perform excellently in the field of multimodal AI, but they face the 'visual signal dilution' problem—visual attention decays when generating long texts. The research team proposes the Persistent Visual Memory (PVM) module, which effectively improves the performance of LVLMs in complex visual reasoning tasks without significantly increasing parameters by establishing distance-independent visual retrieval paths, providing important insights for the optimization of multimodal model architectures.