Zing Forum

Reading

Chitu: A Production-Grade Large Model Inference Engine Open-Sourced by Tsinghua Team, Fully Supporting Domestic Chips

The Chitu inference framework, open-sourced by Tsinghua University's PACMAN Lab, not only supports the full range of NVIDIA GPUs but also deeply adapts to domestic chips such as Huawei Ascend, Moore Threads, Muxi, and Hygon, enabling full-scenario deployment from single-card to cluster.

Chitu赤兔大模型推理清华PACMAN国产芯片昇腾摩尔线程沐曦DeepSeekQwen
Published 2026-04-01 12:14Recent activity 2026-04-01 12:17Estimated read 7 min
Chitu: A Production-Grade Large Model Inference Engine Open-Sourced by Tsinghua Team, Fully Supporting Domestic Chips
1

Section 01

Introduction: Tsinghua Open-Sources Chitu Inference Engine, Fully Supporting Domestic Chips and Full-Scenario Deployment

The Chitu (赤兔) inference framework, open-sourced by Tsinghua University's PACMAN Lab, is positioned as a production-grade large model inference engine with both high performance and stability. Its core advantages include: supporting the full range of NVIDIA GPUs and domestic chips such as Huawei Ascend, Moore Threads, Muxi, and Hygon; covering full-scenario deployment from pure CPU, single-card GPU to large-scale clusters; being compatible with mainstream large models like DeepSeek, Qwen, and GLM; and having technical highlights such as FP4/FP8 quantization and CPU+GPU heterogeneous hybrid inference, which can handle real concurrent business traffic.

2

Section 02

Project Background and Positioning

The Chinese name of Chitu ('赤兔') implies speed and power. Its design goal is to build an efficient, flexible, and usable high-performance inference framework. Unlike engines optimized for a single hardware, it considers the progressive needs of enterprise AI implementation from the very beginning of design, providing scalable solutions from laboratory experiments to large-scale production. It is clearly positioned as 'production-grade', not only pursuing extreme performance but also ensuring long-term operational stability and reliability, capable of handling real concurrent business traffic.

3

Section 03

Multi-Computing Power Adaptation: Deep Support for Domestic Chips

Chitu's comprehensive support for multi-computing power is one of its core features:

  • Full range of NVIDIA: Covers products from Blackwell architecture to older series;
  • Huawei Ascend: v0.3.5 supports native deployment of Ascend 910B, v0.3.9 first launched GLM-4.5 MoE model inference on Ascend;
  • Moore Threads: Adaptation completed in v0.5.1;
  • Muxi, Hygon: Performance and stability improved in v0.4.0. This allows enterprises to flexibly choose computing power platforms and avoid single-vendor lock-in.
4

Section 04

Full-Scenario Scalable Deployment Solutions

Chitu supports full-scenario deployment:

  • Pure CPU deployment: Reduces hardware thresholds, suitable for lightweight inference scenarios;
  • Single-card GPU deployment: Through CPU+GPU heterogeneous hybrid inference (v0.2.2), a single card can run the DeepSeek-R1 671B super-large model; v0.3.0 added FP4 online conversion to FP8/BF16 operators, supporting the FP4 quantized version of this model;
  • Large-scale cluster deployment: v0.5.0 improved cluster performance to meet enterprises' high concurrency needs.
5

Section 05

Model Ecosystem and Core Technical Highlights

Model Ecosystem: Supports mainstream large models like DeepSeek, Qwen, GLM, and Kimi; v0.3.5 provides high-performance solutions for the Qwen3 series; v0.3.9 first launched GLM-4.5 MoE deployment on Ascend; Technical Highlights:

  1. Quantization Support: v0.1.0 supports FP8 to BF16 conversion; v0.3.0 added FP4 to FP8/BF16 conversion, reducing memory and computing overhead;
  2. Heterogeneous Hybrid Inference: Intelligently distributes CPU/GPU tasks, enabling single-card operation of super-large models;
  3. Production-Grade Stability: Emphasizes long-term stable operation and adapts to real business scenarios.
6

Section 06

Rapid Deployment and Open-Source Ecosystem

Rapid Deployment: Provides multi-platform Docker images, such as NVIDIA (arch8.0/8.9, 9.0), Muxi, Ascend (A2/A3), etc., lowering the entry threshold; Open-Source Ecosystem: Adopts the Apache License v2.0 protocol, with code hosted on GitHub. The team actively draws inspiration from projects like DeepSeek and FlashAttention, and welcomes community contributions while providing detailed guidelines.

7

Section 07

Application Value and Future Outlook

Chitu's value to enterprises: Adaptation to domestic chips has strategic significance, and production-grade stability reduces technical risks; Outlook: As large model scenarios expand, the importance of inference engines becomes prominent, and Chitu is expected to play a key role in the domestic ecosystem; Suggestion: Teams that need to reduce inference costs, improve performance, or deploy large models on domestic chips can evaluate and try Chitu.