Section 01
Abbott-Costello-Benchmark: Evaluating LLM Cultural Understanding Ability Using Classic Comedy Dialogues
This article introduces the Abbott-Costello-Benchmark, an open-source benchmark that uses dialogues from the classic comedy duo Abbott and Costello as materials. It specifically evaluates large language models (LLMs) in terms of personality analysis, character distinction, cultural context understanding, and other capabilities, filling the gap in traditional benchmarks that ignore cultural and social context comprehension.