Section 01
【Introduction】Core Summary of Grace Hopper 200 Practical Evaluation
This study evaluated the multi-file React Native application generation capabilities of five open-source code models—Kimi-K2.5, GLM-5.1, Qwen3-Coder-480B, and DeepSeek-V3.2—on the NVIDIA GH200. Key findings include: SWE-Bench rankings cannot predict actual task performance; Kimi-K2.5 produced the best output under aggressive 3-bit quantization; three deployment issues were revealed: inference model sampling suspension, thought trace leakage, and Web adaptation gaps.