Section 01
Introduction: Core Value of the HalLing Benchmark
The HalLing (Hallucination in Linguistic Reasoning) benchmark approaches from a linguistic perspective, systematically evaluating the hallucination tendencies of large models in linguistic reasoning through six phenomena: ambiguous sentences, anaphora resolution, center embedding, garden-path sentences, quantifier scope, and first-order logic extension. Unlike traditional evaluation methods that focus on factual errors, it pays more attention to whether the model truly understands the semantic structure of the input text, revealing the deep shortcomings of current large models in language comprehension ability.