Section 01
DelTA Method Guide: Improving Token-Level Credit Assignment Efficiency in RLVR
DelTA (Discriminative Token Credit Assignment Method) is an innovative training method for Reinforcement Learning with Verifiable Rewards (RLVR). Its core lies in amplifying the gradient direction of discriminative tokens and suppressing shared high-frequency patterns through a discriminative token credit assignment mechanism. On mathematical reasoning benchmarks, Qwen3-8B-Base achieves an average improvement of 3.26 percentage points compared to the strongest baseline of the same scale, and Qwen3-14B-Base improves by 2.62 percentage points. This effectively solves the problem in traditional RLVR where response-level reward averaging dilutes the signals of key tokens.