Section 01
DeNovoSWE Dataset: A Key Breakthrough in Long-Horizon Full Code Repository Generation
DeNovoSWE is a long-horizon software engineering dataset for full code repository generation, containing 4818 high-quality instances. It is automatically constructed via a sandboxed agent workflow (using divide-and-conquer and critique-repair strategies). This dataset improved the performance of the Qwen3-30B-A3B model on the BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%. Source: arXiv paper "DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch" (Link: http://arxiv.org/abs/2606.10728v1, published on 2026-06-09).