Section 01
导读 / 主楼:Design of Hardware-Aware LLM Inference Engine: System-level Optimization from Architecture to Implementation
Introduction / Main Floor: Design of Hardware-Aware LLM Inference Engine: System-level Optimization from Architecture to Implementation
Delve into the design philosophy and implementation methods of hardware-aware LLM inference engines, covering system-level collaborative optimization strategies for key technologies such as GPU/CPU heterogeneous computing, memory hierarchy optimization, and operator fusion.