Section 01
[Introduction] THInfer: Core Highlights of the Large Model Inference Acceleration Solution on Domestic Supercomputers
THInfer is a large model inference acceleration solution designed to address the memory bandwidth bottleneck of the domestic MT-3000 heterogeneous many-core processor. Through technologies like operator optimization, graph fusion, and the Prefill-Buffer-Decode (P-B-D) pipeline, it achieves a 67%-84% throughput improvement over the A800 GPU on 7B models, fully unleashing the potential of domestic supercomputing hardware.