Zing Forum

Reading

TTA*: A New Paradigm for Small Model Reasoning, Test-Time A* Search Algorithm Without Fine-Tuning

TTA* transforms multi-step reasoning problems into goal-oriented tree search, guiding small language models to self-improve during reasoning via the cost function of the A* algorithm. It enhances reasoning capabilities without fine-tuning or external reward models.

TTA*A*搜索小语言模型测试时优化推理增强自我批判GSM8K多步推理树搜索无需微调
Published 2026-04-01 12:15Recent activity 2026-04-01 12:18Estimated read 6 min
TTA*: A New Paradigm for Small Model Reasoning, Test-Time A* Search Algorithm Without Fine-Tuning
1

Section 01

TTA*: Guide to the New Paradigm of Small Model Reasoning Without Fine-Tuning

TTA* (Test-Time A* Search) is a new reasoning enhancement method for small language models. Its core lies in transforming multi-step reasoning into goal-oriented tree search, guiding the model to self-improve during reasoning via the cost function of the A* algorithm. This method enhances the complex reasoning capabilities of small models without fine-tuning or external reward models, providing a new idea for model optimization in resource-constrained scenarios.

2

Section 02

Pain Points of Small Model Reasoning and Traditional Solutions

Large language models (such as GPT-4, Claude) have strong reasoning capabilities but high deployment costs; small models are resource-friendly but perform poorly in complex reasoning tasks. Traditional solutions to improve small model reasoning capabilities rely on expensive fine-tuning or complex reinforcement learning training, which have high thresholds. TTA* shifts the optimization focus from the training phase to the reasoning phase, enabling capability improvement without changing model weights.

3

Section 03

TTA* Algorithm Mechanism and Self-Criticism Design

TTA* is based on the A* search framework, with the core cost function f(n)=w·depth(n)+(100-Reward(n)):

  • Path cost g(n): Penalizes lengthy reasoning and encourages concise solutions;
  • Heuristic evaluation h(n): Calculated via the median reward value of the model's self-evaluation to reduce evaluation noise. Search process: Select the node with the smallest f value, generate improved candidates through self-criticism, and add them to the search frontier after evaluation. The self-criticism mechanism uses the median of multiple evaluations and generates specific critical text to guide improvement.
4

Section 04

GSM8K Experimental Verification and Code Architecture

TTA* was validated on the GSM8K mathematical reasoning dataset (about 8000 primary school math problems). Experimental configurations can be flexibly controlled via parameters (e.g., max_iter controls the number of iterations, num_children controls the number of child nodes). The code adopts a modular design:

  • Core modules: LLMWrapper (model calling), Node (node logic), TTAStar (search main loop), evaluate.py (accuracy calculation);
  • Experimental script: run_gsm8k.py integrates the process and supports command-line parameter configuration.
5

Section 05

Advantages and Limitations of TTA*

Advantages:

  1. Training-agnostic: No need for fine-tuning or RL training;
  2. Model-agnostic: Applicable to any language model with basic generation capabilities;
  3. Interpretability: Transparent search process, allowing tracking of improvement paths;
  4. Resource-friendly: Can run on small models. Limitations:
  5. Increased reasoning cost: Requires multiple generations and evaluations;
  6. Parameter tuning: Needs to adjust search parameters according to tasks;
  7. Task applicability: Simple tasks may not be worth the effort.
6

Section 06

Future Expansion Directions of TTA*

The TTA* team plans to support more challenging mathematical reasoning datasets (such as MATH500, MATH401, AIME) to further verify effectiveness in complex scenarios. In addition, this method can be extended to other fields:

  • Code generation: Gradually improve candidate code;
  • Logical reasoning: Apply to multi-step logical deduction tasks;
  • Creative writing: Iteratively generate high-quality text.
7

Section 07

Practical Significance and Open-Source Information of TTA*

TTA* represents a paradigm shift in reasoning optimization—from training resource investment to reasoning algorithm optimization, echoing the successful ideas of models like DeepSeek-R1. For enterprises, it can improve performance by increasing reasoning computation investment without retraining models, offering high flexibility. The TTA* code has been open-sourced; interested parties can visit the GitHub repository to obtain implementations and experimental scripts.