Section 01
ChineseStressBench: A Chinese Evaluation Benchmark for High-Pressure Complex Tasks in Real-World Work Scenarios (Introduction)
ChineseStressBench is a Chinese large language model (LLM) evaluation benchmark designed for real-world work scenarios. Its core focus is on whether models will cause "problematic outcomes" (such as misleading outputs, key information omissions, logical confusion, etc.) in high-pressure complex tasks. The project aims to address the gap in existing evaluations that only focus on the upper limit of model capabilities while ignoring reliability in real scenarios. Through task designs that closely mimic actual work, it promotes the transformation of LLMs from "usable" to "user-friendly".