Section 01
[Introduction] WaveTune: An Innovative Framework Redefining the Efficiency Boundary of GPU Kernel Auto-Tuning
The WaveTune framework addresses the performance-efficiency trade-off in GPU kernel tuning through a wave-aware bilinear model and a lightweight dual-table retrieval mechanism. Its core lies in a modeling approach that integrates GPU hardware knowledge, delivering up to 1.83x kernel speedup and 1.33x end-to-end TTFT reduction across five GPU architectures. The decision overhead is reduced by five orders of magnitude compared to exhaustive search, providing a new path for improving LLM inference efficiency.