Section 01
[Introduction] Perceptual Judgment Bias in Multimodal Large Model Evaluation and Its Solutions
Key Takeaways
This study focuses on the perceptual judgment bias of Multimodal Large Language Models (MLLMs) when acting as automatic evaluators:
- Problem: MLLM evaluators are easily misled by text fluency, ignoring the authenticity of visual content, leading to inconsistent and unverifiable evaluations;
- Solution: Proposes the construction method of the Perceptual Perturbation Judgment Dataset (PPJ Dataset), combined with a training framework using GRPO reinforcement learning and batch ranking objectives;
- Effect: Significantly improves the evaluator's perceptual fidelity, ranking consistency, and alignment with human evaluations.