Multimodal Large Language Models (MLLMs) face a fundamental dilemma in practical applications: the trade-off between factual accuracy and creative expression. Traditional models often struggle to balance these two extremes—overly conservative models may generate accurate but dull responses, while overly free models tend to produce hallucinations and output factually inconsistent content.
The essence of this dilemma lies in the inability to flexibly regulate the model's internal associative reasoning mechanism. Associative reasoning refers to the model's ability to automatically activate relevant concepts and knowledge based on input stimuli; it is the source of creativity but also the root cause of hallucinations. How to precisely control this association strength in different task scenarios has long been a core challenge in the field of multimodal AI.