Section 01
[Introduction] Fusing Dequantization and GEMM: Practical CUDA Kernel Optimization for LLM Inference
Introduces the fused-dequant-gemm project, whose core is merging dequantization (from INT8 weight quantization) with GEMM via CUDA kernel fusion technology to address the memory bandwidth bottleneck in LLM inference, reduce DRAM consumption, and improve performance. The project was open-sourced by zhangtina0103 and released on GitHub on June 6, 2026.