Section 01
【Introduction】New Benchmark for Evaluating Personalized Reward Models Released; SOTA Models Achieve Only 75.94% Accuracy
This article introduces Personalized RewardBench, the first benchmark for evaluating the personalization capabilities of reward models. It reveals that current state-of-the-art (SOTA) reward models have significant deficiencies in understanding individual user preferences, with an accuracy rate of only 75.94%. This benchmark establishes a stronger correlation with the performance of downstream tasks (such as Best-of-N sampling and PPO optimization), providing a key evaluation tool and new research directions for the personalization aspect of AI alignment research.