Section 01
[Introduction] TeleCom-Bench: Capability Boundaries and Execution Gap of LLMs in Telecom Industrial Applications
The AI Cloud Team of ZTE released the TeleCom-Bench benchmark in May 2026, which includes 22,678 samples. It systematically evaluates LLMs' capabilities in knowledge understanding and end-to-end workflow applications in the telecom field. It reveals the "execution gap" phenomenon: the model achieves an accuracy rate of about 90% in language interface tasks (e.g., intent recognition), while dropping sharply to about 30% in procedural execution tasks (e.g., solution generation). This provides key references for the development of LLMs in telecom industrial applications.