Section 01
Introduction: Core Breakthroughs of SU-01 in Achieving Gold Medal-level Reasoning in Olympiads
SU-01 is a reasoning model developed by the research team. It is trained with a 30B-A3B backbone model (mixture-of-experts architecture) and 340K reasoning trajectory data through three core strategies: reverse perplexity curriculum learning, two-stage reinforcement learning, and test-time expansion. The model achieves gold medal-level performance in the International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO), proving that medium-sized models can also master complex scientific reasoning abilities and providing new possibilities for the democratization of reasoning models.