Zing Forum

Reading

HintMR: Enabling Small Language Models to Have Strong Mathematical Reasoning Ability via Prompt Assistance

This article introduces the HintMR framework, which trains a specialized prompt generation model through distilling large models to provide step-by-step, local prompt guidance for the reasoning model, forming a dual-model collaborative system that significantly enhances the mathematical reasoning ability of small models without increasing the size of individual models.

HintMR数学推理小语言模型提示辅助知识蒸馏双模型协作多步推理错误传播
Published 2026-04-14 11:09Recent activity 2026-04-15 09:52Estimated read 6 min
HintMR: Enabling Small Language Models to Have Strong Mathematical Reasoning Ability via Prompt Assistance
1

Section 01

[Introduction] HintMR: Dual-Model Collaboration Enables Small Models to Have Strong Mathematical Reasoning Ability

This article introduces the HintMR framework, which trains a prompt generation model by distilling large models and forms a dual-model collaborative system with the reasoning model. It significantly enhances the mathematical reasoning ability of small models without increasing the size of individual models. This framework addresses problems such as difficulty maintaining long-chain reasoning and error cascading effects in small models, providing a new solution for mathematical reasoning in resource-constrained scenarios.

2

Section 02

[Background] The Mathematical Reasoning Dilemma of Small Models

Large models perform well in mathematical reasoning, but small models face two core challenges: 1. Difficulty maintaining long-chain reasoning: limited context window and memory capacity make it hard to grasp the overall structure; 2. Early error cascading effect: lack of self-correction ability leads to a domino effect of mistakes. The traditional method of increasing model size brings problems such as high computational cost and deployment difficulty, so a new solution is urgently needed.

3

Section 03

[Method] The Dual-Model Collaborative Architecture of HintMR

HintMR constructs a dual-model collaborative system: a prompt generation model (responsible for generating local, targeted prompts) + a reasoning model (executes reasoning under prompt guidance). The prompt generation model learns from large models via knowledge distillation and dynamically generates prompts based on the problem statement and accumulated reasoning history. The collaboration process is iterative: receive problem → generate prompt → execute reasoning → update history → repeat until completion.

4

Section 04

[Evidence] Experimental Validation: Performance Improvement of HintMR

In benchmark tests such as GSM8K and MATH, HintMR significantly improves the reasoning accuracy of small models, maintains computational efficiency (far lower than large models), has strong generalization ability (covering algebra, geometry, etc.), and reduces error propagation. Compared with baseline methods like standard prompts, chain-of-thought, and self-consistency, HintMR performs better on complex long-chain reasoning problems.

5

Section 05

[Innovation] Technical Highlights of HintMR

  1. Decoupling strategy and execution: The prompt generation is responsible for strategy planning, and the reasoning model is responsible for execution, reducing the complexity of each component; 2. Non-intrusive enhancement: No need to modify the internal structure of the model; deployment only requires fine-tuning the prompt generation model; 3. Interpretability: Explicit prompts make the reasoning process transparent, facilitating debugging and understanding.
6

Section 06

[Application] Potential Scenarios of HintMR

HintMR can be applied in: 1. Educational assistance: Serving as an intelligent tutoring system to provide personalized prompts; 2. Edge device deployment: Running in resource-constrained environments (mobile phones, IoT); 3. Multilingual mathematical reasoning: Supporting problems in different languages; 4. Professional fields: Professional mathematical reasoning in physics, engineering, etc.

7

Section 07

[Limitations] Challenges Faced by HintMR

  1. Dependence on prompt quality: System performance is affected by the quality of the prompt generation model; 2. Interaction overhead: Multiple interactions between the two models increase reasoning latency; 3. Complexity of prompt design: Training data requires domain knowledge, and automated generation and evaluation need to be addressed; 4. Risk of error accumulation: Errors in the prompt model may mislead the reasoning model.
8

Section 08

[Future] Research Directions and Conclusion

Future directions include multi-agent expansion, adaptive prompt strategies, reinforcement learning optimization, and cross-modal reasoning. Conclusion: HintMR represents a paradigm shift—enabling small models to work collaboratively through a cooperation mechanism, providing a direction for building efficient and sustainable AI systems. Paper link: http://arxiv.org/abs/2604.12229v1