Zing Forum

Reading

OpenSeeker-v2: A Cutting-Edge Search Agent Trained Only with SFT

OpenSeeker-v2 achieves SOTA on multiple search benchmarks using only 10.6k samples and SFT training through a high-quality data synthesis strategy, challenging the complex CPT+SFT+RL training paradigm commonly used in the industry.

搜索智能体大语言模型监督微调数据合成AgentSFTBrowseComp
Published 2026-05-06 01:55Recent activity 2026-05-07 09:37Estimated read 5 min
OpenSeeker-v2: A Cutting-Edge Search Agent Trained Only with SFT
1

Section 01

OpenSeeker-v2: Introduction to the Cutting-Edge Search Agent Trained Only with SFT

OpenSeeker-v2 achieves state-of-the-art (SOTA) performance on multiple search benchmarks using only 10.6k samples and supervised fine-tuning (SFT) training through a high-quality data synthesis strategy, challenging the complex CPT+SFT+RL training paradigm commonly used in the industry. This article will analyze it from aspects such as background, methods, and experimental results.

2

Section 02

Background and Challenges: Thoughts on Breaking the Complex Training Paradigm in Industry

Deep search capability is the core competitiveness of cutting-edge large language model agents, but this field has long been dominated by industrial giants. Their training process involves multi-stage steps such as pre-training, continuous pre-training (CPT), SFT, and reinforcement learning (RL), which are costly and form academic barriers. The research team raised a question: Is it necessary to rely on complex processes to build cutting-edge search agents? They believe that under high-quality trajectory data training, simple SFT can also achieve excellent results.

3

Section 03

Core Methods: Three Key Improvements in Data Synthesis

The success of OpenSeeker-v2 comes from the optimization of data synthesis strategies, which includes three elements:

  1. Expand knowledge graph scale: Expand the coverage of the knowledge graph to provide a richer exploration space and improve generalization ability;
  2. Expand toolset scale: Add callable tools (including professional retrieval interfaces) to handle complex queries;
  3. Strict low-step filtering: Screen trajectories that complete complex tasks with fewer steps to ensure the efficiency of training data.
4

Section 04

Experimental Results: Challenging Industry SOTA with Only SFT Training

OpenSeeker-v2 was trained with 10.6k samples and outperformed Tongyi DeepResearch (which uses the CPT+SFT+RL process) in four mainstream benchmark tests:

Benchmark OpenSeeker-v2 Tongyi DeepResearch
BrowseComp 46.0% 43.4%
BrowseComp-ZH 58.1% 46.7%
Humanity's Last Exam 34.6% 32.9%
xbench 78.0% 75.0%
This proves that data quality is superior to training complexity.
5

Section 05

Technical Significance: Breaking Monopolies and Re-examining Training Processes

The significance of OpenSeeker-v2:

  1. Break industrial monopoly: The first SOTA search agent of the same scale developed by an academic team;
  2. Re-examine training philosophy: Simple training methods (SFT) combined with high-quality data can surpass complex processes, providing ideas for resource-constrained teams;
  3. Emphasize the importance of data engineering: Data quality is the key to model performance, and the three data strategies provide reusable methodologies.
6

Section 06

Limitations and Future Directions: Room for Further Exploration

Limitations: Based on the 30B model and ReAct paradigm, no exploration of larger models or other architectures; the optimal configuration of data synthesis strategies needs to be adjusted according to tasks. Future Directions: Explore the combination of SFT and lightweight RL; apply data strategies to other agent tasks; reduce dependence on large-scale knowledge graphs.

7

Section 07

Conclusion: Victory of Simple Methods + High-Quality Data

OpenSeeker-v2 proves that simple methods combined with high-quality data can defeat complex engineering stacks, providing a more cost-effective technical path for the industry. With the model weights open-sourced, we look forward to more follow-up research emerging.