Zing Forum

Reading

Small Models, Big Wisdom: How Qwen3-1.7B Breaks Through the 'Reasoning Gap' in Vietnamese Mathematical Reasoning

A groundbreaking study reveals the potential and challenges of small language models (SLMs) in non-English reasoning tasks. By constructing the Vietnamese elementary math dataset Vi-S1K and the evaluation benchmark Vi-Elementary-Bench, the study found that supervised fine-tuning (SFT) can unlock the hidden reasoning capabilities of models, while complex agent frameworks may instead become a cognitive burden.

小语言模型SLM越南语数学推理测试时缩放监督微调SFTQwen3边缘AI智能体框架
Published 2026-04-20 12:36Recent activity 2026-04-21 10:51Estimated read 7 min
Small Models, Big Wisdom: How Qwen3-1.7B Breaks Through the 'Reasoning Gap' in Vietnamese Mathematical Reasoning
1

Section 01

Small Models, Big Wisdom: How Qwen3-1.7B Breaks Through the 'Reasoning Gap' in Vietnamese Mathematical Reasoning

A groundbreaking study focuses on the potential and challenges of small language models (SLMs) in non-English reasoning tasks, using Qwen3-1.7B as the research object. By constructing the Vietnamese elementary math dataset Vi-S1K and the evaluation benchmark Vi-Elementary-Bench, it was found that supervised fine-tuning (SFT) can unlock the hidden reasoning capabilities of the model, while complex agent frameworks (such as ReAct) instead become a cognitive burden, providing a new path for edge AI to achieve complex reasoning.

2

Section 02

Research Background: The Necessity and Challenges of Small Models + Non-English Reasoning

Reasoning Dilemma of Edge AI

The vision of ubiquitous AI requires models to run on edge devices, but small language models (SLMs) face a "reasoning gap" and struggle to maintain a coherent chain of thought. Non-English environments (such as Vietnamese's unique grammar and tones) add further complexity.

Comparison Between Large and Small Models

Large models (like GPT-4) have strong reasoning abilities but rely on the cloud, with high costs and data security concerns; 1.7B-scale small models can run on ordinary devices, and if they have reasoning capabilities, they can promote AI democratization.

Underestimated Challenges of Non-English Languages

Existing research is English-centric, and the impact of grammar and cultural differences in non-English languages on reasoning far exceeds translation issues.

3

Section 03

Research Methods: Constructing a Vietnamese Mathematical Reasoning Dataset and Evaluation Benchmark

Vi-S1K Dataset

Contains 1000 carefully curated Vietnamese elementary math problems, each with detailed solution steps and explanations; localized via the Gemini 2.5 Flash-Lite pipeline to ensure terms comply with Vietnamese textbook standards, problems are culturally relevant, and solution steps align with local teaching traditions.

Vi-Elementary-Bench Benchmark

Two-dimensional evaluation: computational accuracy (whether the correct answer is obtained) and explanation quality (whether the problem-solving思路 can be clearly explained), reflecting the math education goal of "knowing not only the result but also the reason".

4

Section 04

Key Findings: Unlocking Hidden Capabilities, Value of SFT, and Cognitive Burden of Complex Frameworks

Hidden Reasoning Capabilities

The Qwen3-1.7B base model achieves a computational accuracy of 4.05/5, with a "format gap"—it has correct knowledge but cannot output it in the format expected by humans.

Unlocking Effect of SFT

Supervised fine-tuning improves explanation quality by 77%, proving that SFT is a reasoning unlocker. High-quality small-scale datasets (like Vi-S1K) are more effective than large-scale low-quality data, and domain-specific fine-tuning yields significant benefits.

Cognitive Tax of Complex Frameworks

Agent frameworks like ReAct reduce small model performance due to attention distraction, format overhead, and error accumulation; the pure Chain of Thought (CoT) + self-consistency strategy performs best.

5

Section 05

Research Conclusions: Best Practices for Edge Deployment and Implications for AI Democratization

Hierarchical Strategy for Edge Deployment

  1. Supervised fine-tuning (essential, unlocks reasoning capabilities); 2. Simplified test-time scaling (CoT + self-consistency, controllable overhead); 3. Avoid complex agent frameworks (suitable for 7B+ models).

Implications for AI Democratization

  • Language diversity: The Vietnamese experience can be extended to other underserved languages;
  • Small model strategy: Well-fine-tuned small models are more effective in resource-constrained scenarios;
  • Data engineering: High-quality domain-specific datasets are key.

Big Future of Small Models

Small models are expected to allow non-English users to enjoy AI services without relying on the cloud, which is a key path to AI democratization.

6

Section 06

Limitations and Future Research Directions

Research Limitations

  • Evaluation only covers the field of Vietnamese elementary math;
  • Only uses the single architecture of Qwen3-1.7B.

Future Directions

  • Expand to more non-English languages and subject areas;
  • Explore the impact of model compression and quantization techniques on reasoning capabilities;
  • Study whether multilingual joint training improves monolingual reasoning performance.