Section 01
[Overview] HumbleBench: An Evaluation Benchmark for Cognitive Humility of Multimodal Large Language Models
HumbleBench is an evaluation benchmark for the cognitive humility of multimodal large language models (MLLMs). It fills the gap in traditional benchmarks that ignore models' self-awareness and honest expression under uncertainty, emphasizing the core value of this ability for building reliable and safe AI systems.