Section 01
Introduction: Core Insights from the Empirical Study on Limited Impact of VEA on Language Model Behavior
This study systematically assesses the real impact of "Verbalized Evaluation Awareness (VEA)" in the chain of thought of language models through on-policy and off-policy experiments. It finds that VEA has an extremely limited effect on model outputs, challenging the existing view that a high VEA rate equals strategic behavior or alignment tampering.