Section 01
Introduction: Core Insights of the "Mental Imagery" Study on Multimodal Models
Title: "Mental Imagery" of Multimodal Models: Does AI Really "Imagine" in Its Mind? Core Insights Summary: Studies have found that large multimodal models form internal visual representations similar to human mental imagery when solving spatial puzzles. By integrating visual tokens into the chain of thought, the reasoning accuracy increases from 83% to 89%. This finding not only addresses the philosophical question of whether AI has human-like inner experiences but also provides a new perspective for improving model reasoning capabilities and understanding AI cognition.