Zing Forum

Reading

AutoTTS: Enabling AI to Automatically Discover Optimal Test-Time Scaling Strategies

AutoTTS constructs a controllable search environment to enable agents to automatically discover test-time computation allocation strategies. It discovered reasoning strategies that outperform manually designed ones at a cost of only $39.9 and 160 minutes, while achieving generalization across benchmarks and model scales.

测试时扩展TTSAutoTTS推理策略智能体发现LLM优化
Published 2026-05-09 01:59Recent activity 2026-05-11 10:52Estimated read 6 min
AutoTTS: Enabling AI to Automatically Discover Optimal Test-Time Scaling Strategies
1

Section 01

AutoTTS: Guide to AI's Automatic Discovery of Optimal Test-Time Scaling Strategies

AutoTTS constructs a controllable search environment to enable agents to automatically discover test-time computation allocation strategies. It discovered reasoning strategies that outperform manually designed ones at a cost of only $39.9 and 160 minutes, while achieving generalization across benchmarks and model scales. This framework marks the shift of LLM reasoning optimization from experience-driven to data-driven approaches, providing new ideas for reasoning cost optimization.

2

Section 02

Background: Dilemmas of Manual Design for Test-Time Scaling

Test-Time Scaling (TTS) is an important technology for improving the inference capabilities of large language models, which trades additional computational resources during the inference phase for higher accuracy. However, current mainstream TTS strategies rely on manual design and have limitations: incomplete human understanding of optimal strategies, high cost of manual tuning across different tasks/models, and lack of systematicity making it difficult to ensure optimality.

3

Section 03

Methodology: AutoTTS's Automatic Strategy Discovery Mechanism

The core of the AutoTTS framework is to shift the role of researchers from designing strategies to designing the strategy discovery environment (which needs to compress the control space and provide low-cost feedback). Specifically, it formalizes the width-depth TTS problem as a controller synthesis problem, where the controller decides operations such as branch exploration and path continuation. Evaluation does not require repeated calls to LLMs to reduce costs. Additionally, it introduces beta parameterization technology (mapping high-dimensional discrete spaces to low-dimensional continuous spaces) and fine-grained execution trajectory feedback (providing complete trajectory diagnostic information to accelerate iteration).

4

Section 04

Evidence: Experimental Results and Cost-Effectiveness of AutoTTS

Experimental verification shows that the strategies discovered by AutoTTS comprehensively outperform manually designed baselines in math reasoning benchmarks—either higher accuracy at the same budget or lower cost at the same accuracy. The strategies have generalization capabilities across tasks (unseen benchmarks) and model scales. The entire discovery process only cost $39.9 and took 160 minutes, demonstrating significant cost-effectiveness.

5

Section 05

Conclusion: Significance of AutoTTS for LLM Reasoning Optimization

AutoTTS marks the shift of LLM reasoning optimization from experience-driven to data-driven approaches, establishing a scalable and reproducible strategy discovery process applicable to broader scenarios such as multimodal reasoning. From an industrial perspective, it provides new ideas for cost optimization of LLM inference services, affecting the marginal cost and scalability of AI applications. Meanwhile, interpretable strategies provide materials for understanding the reasoning mechanisms of LLMs.

6

Section 06

Outlook: Limitations of AutoTTS and Future Research Directions

AutoTTS has limitations: it mainly targets math reasoning, and its effectiveness in open-domain tasks remains to be verified; environment design still requires manual input; costs may still be high in resource-constrained scenarios. Future directions include exploring more efficient search algorithms to reduce costs, expanding to multi-agent collaboration, and studying strategy composability—opening up new possibilities for the self-improvement capabilities of LLMs.