Section 01
East vs. West Large Models Code Capability Showdown: Guide to How Prompt Changes Affect Generation Quality
A study by Chitkara University in India evaluated the code generation performance of six mainstream large language models (LLMs), focusing on how changes in prompt formats affect outputs. The participating models cover both Eastern and Western vendors: Western models include Claude 3.7 Sonnet, Gemini 2.0 Flash, and GPT-4o; Eastern models include GLM-4-Plus, MiniMax-M2, and Kimi K2 Instruct. The study used a four-dimensional evaluation framework (functional accuracy, grammatical correctness, optimization quality, response efficiency). Results show that Claude 3.7 Sonnet leads with an average score of 91.3%, followed closely by Kimi K2 Instruct (88.6%), and there are significant differences in the robustness of different models to prompt changes.