Section 01
AtManRL: Core Guide to Training Honest Reasoning Models with Differentiable Attention Saliency
This article introduces the AtManRL method, which aims to address the "dishonesty" problem in Chain-of-Thought (CoT) reasoning of Large Language Models (LLMs) — i.e., the reasoning process may be irrelevant to answer generation. The method identifies key tokens in reasoning chains via differentiable attention masks, combines saliency rewards with outcome rewards, and jointly optimizes the correctness and interpretability of reasoning under the GRPO framework, providing a new path for building trustworthy AI.