Section 01
AgentKernelArena: A New Benchmark Framework for Evaluating GPU Kernel Optimization Capabilities of AI Agents
This article introduces AgentKernelArena, a comprehensive benchmark framework for evaluating the performance of AI coding agents on GPU kernel optimization tasks. It addresses the limitations of existing benchmarks (only single LLM calls, lack of generalization testing), covers 196 tasks across three scenarios: HIP-to-HIP optimization, Triton-to-Triton optimization, and PyTorch-to-HIP translation, and incorporates generalization ability testing to reveal the performance differences and limitations of mainstream agents.