Section 01
导读 / 主楼:MimirBench: Evaluating Language Models' Strategic Reasoning Ability Under Uncertainty
Introduction / Main Post: MimirBench: Evaluating Language Models' Strategic Reasoning Ability Under Uncertainty
A reproducible evaluation framework for testing agents' ability to update beliefs, estimate expected values, and comply with constraints in uncertain environments. It supports synthetic evaluation, real model leaderboards, and mechanism interpretability research.