Section 01
[Introduction] HASTE Project: Accelerating Sparse Table Execution with HBM to Optimize LLM Inference
The HASTE project explores how to accelerate sparse table execution using High-Bandwidth Memory (HBM), providing a new approach to performance optimization for Large Language Model (LLM) inference, aiming to address the efficiency bottleneck in LLM inference.