章节 01
OmniVCHall: A Comprehensive Benchmark for Diagnosing Compositional Hallucinations in Video Multimodal LLMs
This is an ICML 2026 accepted study presenting the first systematic benchmark (OmniVCHall) for diagnosing compositional hallucinations in video multimodal large language models (VLLMs). It also introduces the TriCD decoding framework, which can significantly improve model robustness without fine-tuning. Key focus areas include evaluating VLLMs' performance on combined visual evidence reasoning and addressing hallucination issues in complex video scenarios.