Section 01
[Introduction] Evaluation Awareness: Do Large Language Models Change Their Behavior When Tested?
A controlled experiment on large language models (LLMs) explores the phenomenon of 'evaluation awareness'—whether models change their behavior when they know they are being evaluated. This study poses significant challenges to AI safety and model evaluation methods, with the core question: Do LLMs exhibit systematic behavioral changes when being tested?