Reading

Two Lines of Jinja Template Optimization: Enabling 100% Chinese Chain of Thought for Qwen 3.5 Series

Qwen大语言模型思维链Chain-of-ThoughtJinja模板中文推理提示工程模型优化

Published 2026-05-04 15:42Recent activity 2026-05-04 15:50Estimated read 6 min

Two Lines of Jinja Template Optimization: Enabling 100% Chinese Chain of Thought for Qwen 3.5 Series

Section 01

Introduction: Two Lines of Jinja Template Optimization Enables 100% Chinese Chain of Thought for Qwen 3.5 Series

By modifying just two lines of Jinja template code, we successfully reduced the Chain of Thought (CoT) loop rate of all Qwen 3.5 series models from 5/8 to 1/22, achieving zero-cost Chinese reasoning optimization without retraining. This solution resolves the logical loop issue caused by the Qwen model unconsciously switching to English during reasoning, significantly improving the stability of Chinese reasoning.

Section 02

Background: The Challenge of Language Preference in Large Model Reasoning

Although Qwen series models have strong Chinese capabilities, they tend to switch to English thinking patterns during complex reasoning, leading to logical loops (repeated switching between Chinese and English, self-correction). In multi-step reasoning, logical confusion (repeated explanations, jumps, deviation from the problem) often occurs in steps 5-8, affecting user experience and reliability in production environments.

Section 03

Method: Subtle Adjustments to Jinja Templates

The core idea is to adjust the Jinja template format of input prompts without modifying model weights or retraining. Key modifications include:

System role localization: Define the model's role and tasks in Chinese to establish a Chinese context;
Thinking process guidance: Standardize Chinese thinking steps through examples;
Special marker reinforcement: Use separators to solidify the structure of Chinese paragraphs.

Section 04

Evidence: Verification of Significantly Reduced Loop Rate

Before optimization, the loop rate was 5/8 (62.5%), which dropped to 1/22 (about 4.5%) after optimization. Benefits include:

Response stability increased from 37.5% to 95.5%;
Reduced extra token consumption, lowering reasoning costs;
Eliminated English reasoning processes, improving user experience;
Low development threshold, applicable to all Qwen 3.5 series models.

Section 05

Technical Principle: Core Reasons for Effective Template Adjustments

Contextual learning: The model understands tasks through Chinese prompt formats without parameter adjustments;
Autoregressive path dependence: Once the initial Chinese mode is established, subsequent tokens tend to use Chinese collocations;
Chain of Thought anchoring: A single Chinese framework helps the model follow steps and avoid getting lost in language switching.

Section 06

Practical Significance: Value for Developers and the Chinese AI Ecosystem

Enterprise developers: Zero-cost optimization (only modifying application-layer templates), suitable for production environments where frequent model updates are not feasible;
Chinese AI ecosystem: Provides full Chinese application solutions, enhancing user experience;
Prompt engineering: Demonstrates subtle guidance techniques, reflecting the core competitiveness of prompt engineers.

Section 07

Limitations and Future Outlook

Limitations:

Model specificity: Targeted at Qwen 3.5; other models require different strategies;
Task dependence: English is more effective for tasks like code generation;
Version compatibility: Template adjustments are needed for model iterations. Future directions:

Universal multilingual reasoning framework;
Automated A/B testing for template optimization;
Cross-model lightweight optimization applications.

Section 08

Conclusion: A Simple and Elegant Large Model Optimization Case

Jerry-877's project demonstrates the effectiveness of simple solutions in large model applications. Modifying two lines of templates solves practical problems, provides a new perspective on regulating model behavior, and is an experience worth learning for Chinese AI developers.

Two Lines of Jinja Template Optimization: Enabling 100% Chinese Chain of Thought for Qwen 3.5 Series

Introduction: Two Lines of Jinja Template Optimization Enables 100% Chinese Chain of Thought for Qwen 3.5 Series

Background: The Challenge of Language Preference in Large Model Reasoning

Method: Subtle Adjustments to Jinja Templates

Evidence: Verification of Significantly Reduced Loop Rate

Technical Principle: Core Reasons for Effective Template Adjustments

Practical Significance: Value for Developers and the Chinese AI Ecosystem

Limitations and Future Outlook

Conclusion: A Simple and Elegant Large Model Optimization Case

Continue Reading

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

LLM-assisted-analysis: A New Approach to Detecting Logical Vulnerabilities in Smart Contracts Using Large Language Models

Building Modern LLM from Scratch: A Tutorial-level Implementation of Llama-style Language Model