Section 01
Introduction: Ekka—An Effective Solution for Automated Diagnosis of Silent Errors in LLM Inference
Ekka is an automated diagnosis system that identifies the root causes of silent errors in large language model (LLM) inference by systematically aligning and comparing the intermediate execution states of the target framework and the reference framework. Its core idea is to transform silent error diagnosis into a differential debugging problem. It achieves an 80% pass@1 diagnostic accuracy in real-world benchmark tests and has successfully discovered 4 previously unknown silent errors, providing key support for LLM inference optimization.