Section 01
LongTraceRL: Long-Context Reasoning Learning Based on Search Agent Trajectories and Scoring Rewards (Introduction)
LongTraceRL: Long-Context Reasoning Learning Based on Search Agent Trajectories and Scoring Rewards
Abstract: LongTraceRL addresses the challenges of handling distracting information and process supervision in long-context reasoning by constructing hierarchical distracting documents and using entity-level scoring rewards, achieving excellent performance across multiple benchmarks. Keywords: Long-context reasoning, reinforcement learning, process supervision, knowledge graph, search agent, reward design, multi-hop reasoning, RLVR Core Insights: LongTraceRL targets issues like model attention dispersion and information omission in long-context reasoning. It innovatively uses search agent trajectories to construct hierarchical distractors and designs entity-level scoring rewards to achieve fine-grained process supervision, significantly enhancing the model's reasoning ability in complex scenarios.