Section 01
infer-check: Catching Correctness Defects in LLM Inference Engines Missed by Benchmarks [Introduction]
infer-check is a tool specifically designed to detect correctness defects in LLM inference engines. It can identify hidden errors that traditional benchmarks fail to catch, helping developers improve the reliability of inference engines. In large language model (LLM) deployments, performance optimizations of inference engines (such as quantization, pruning, etc.) often introduce hard-to-detect correctness defects. However, traditional benchmarks cannot find these issues due to limitations like focusing on final output metrics. infer-check aims to fill this gap and help improve the reliability of inference engines.