Section 01
[Introduction] DGAO Framework: Addressing the Order Sensitivity of Large Language Models with Reinforcement Learning
The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen) and Baidu Research jointly propose the DGAO (Dual Group Advantage Optimization) framework, which for the first time introduces reinforcement learning into the research of order fairness in large language models (LLMs). It significantly reduces order sensitivity while improving model accuracy, providing a new solution to the order bias problem of LLMs.