Section 01
[Introduction] Benchmark Study on Quantitative Reasoning Ability of Large Language Models in Indoor Air Engineering
A research team from institutions including VinUniversity (Vietnam) and the University of Illinois (USA) conducted a systematic evaluation of the quantitative reasoning ability of Large Language Models (LLMs) in the field of Indoor Air Quality (IAQ) engineering. The study tested multiple mainstream models such as GPT-4.1, Claude 3.7 Sonnet, and Gemini 2.5 Pro, constructed a dataset containing 480 professional questions, and compared the effects of general prompts (NSD) and domain-specific prompts (IAQ). The results revealed significant differences in the performance of different models and the importance of domain knowledge in improving reasoning ability, providing key references for the application of AI in the field of environmental engineering.