# Small Models, Big Wisdom: How Qwen3-1.7B Breaks Through the 'Reasoning Gap' in Vietnamese Mathematical Reasoning

> A groundbreaking study reveals the potential and challenges of small language models (SLMs) in non-English reasoning tasks. By constructing the Vietnamese elementary math dataset Vi-S1K and the evaluation benchmark Vi-Elementary-Bench, the study found that supervised fine-tuning (SFT) can unlock the hidden reasoning capabilities of models, while complex agent frameworks may instead become a cognitive burden.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-20T04:36:03.000Z
- 最近活动: 2026-04-21T02:51:33.696Z
- 热度: 123.7
- 关键词: 小语言模型, SLM, 越南语, 数学推理, 测试时缩放, 监督微调, SFT, Qwen3, 边缘AI, 智能体框架
- 页面链接: https://www.zingnex.cn/en/forum/thread/qwen3-1-7b
- Canonical: https://www.zingnex.cn/forum/thread/qwen3-1-7b
- Markdown 来源: floors_fallback

---

## Small Models, Big Wisdom: How Qwen3-1.7B Breaks Through the 'Reasoning Gap' in Vietnamese Mathematical Reasoning

A groundbreaking study focuses on the potential and challenges of small language models (SLMs) in non-English reasoning tasks, using Qwen3-1.7B as the research object. By constructing the Vietnamese elementary math dataset Vi-S1K and the evaluation benchmark Vi-Elementary-Bench, it was found that supervised fine-tuning (SFT) can unlock the hidden reasoning capabilities of the model, while complex agent frameworks (such as ReAct) instead become a cognitive burden, providing a new path for edge AI to achieve complex reasoning.

## Research Background: The Necessity and Challenges of Small Models + Non-English Reasoning

### Reasoning Dilemma of Edge AI
The vision of ubiquitous AI requires models to run on edge devices, but small language models (SLMs) face a "reasoning gap" and struggle to maintain a coherent chain of thought. Non-English environments (such as Vietnamese's unique grammar and tones) add further complexity.
### Comparison Between Large and Small Models
Large models (like GPT-4) have strong reasoning abilities but rely on the cloud, with high costs and data security concerns; 1.7B-scale small models can run on ordinary devices, and if they have reasoning capabilities, they can promote AI democratization.
### Underestimated Challenges of Non-English Languages
Existing research is English-centric, and the impact of grammar and cultural differences in non-English languages on reasoning far exceeds translation issues.

## Research Methods: Constructing a Vietnamese Mathematical Reasoning Dataset and Evaluation Benchmark

### Vi-S1K Dataset
Contains 1000 carefully curated Vietnamese elementary math problems, each with detailed solution steps and explanations; localized via the Gemini 2.5 Flash-Lite pipeline to ensure terms comply with Vietnamese textbook standards, problems are culturally relevant, and solution steps align with local teaching traditions.
### Vi-Elementary-Bench Benchmark
Two-dimensional evaluation: computational accuracy (whether the correct answer is obtained) and explanation quality (whether the problem-solving思路 can be clearly explained), reflecting the math education goal of "knowing not only the result but also the reason".

## Key Findings: Unlocking Hidden Capabilities, Value of SFT, and Cognitive Burden of Complex Frameworks

### Hidden Reasoning Capabilities
The Qwen3-1.7B base model achieves a computational accuracy of 4.05/5, with a "format gap"—it has correct knowledge but cannot output it in the format expected by humans.
### Unlocking Effect of SFT
Supervised fine-tuning improves explanation quality by 77%, proving that SFT is a reasoning unlocker. High-quality small-scale datasets (like Vi-S1K) are more effective than large-scale low-quality data, and domain-specific fine-tuning yields significant benefits.
### Cognitive Tax of Complex Frameworks
Agent frameworks like ReAct reduce small model performance due to attention distraction, format overhead, and error accumulation; the pure Chain of Thought (CoT) + self-consistency strategy performs best.

## Research Conclusions: Best Practices for Edge Deployment and Implications for AI Democratization

### Hierarchical Strategy for Edge Deployment
1. Supervised fine-tuning (essential, unlocks reasoning capabilities); 2. Simplified test-time scaling (CoT + self-consistency, controllable overhead); 3. Avoid complex agent frameworks (suitable for 7B+ models).
### Implications for AI Democratization
- Language diversity: The Vietnamese experience can be extended to other underserved languages;
- Small model strategy: Well-fine-tuned small models are more effective in resource-constrained scenarios;
- Data engineering: High-quality domain-specific datasets are key.
### Big Future of Small Models
Small models are expected to allow non-English users to enjoy AI services without relying on the cloud, which is a key path to AI democratization.

## Limitations and Future Research Directions

### Research Limitations
- Evaluation only covers the field of Vietnamese elementary math;
- Only uses the single architecture of Qwen3-1.7B.
### Future Directions
- Expand to more non-English languages and subject areas;
- Explore the impact of model compression and quantization techniques on reasoning capabilities;
- Study whether multilingual joint training improves monolingual reasoning performance.
