# Chitu: A Production-Grade Large Model Inference Engine Open-Sourced by Tsinghua Team, Fully Supporting Domestic Chips

> The Chitu inference framework, open-sourced by Tsinghua University's PACMAN Lab, not only supports the full range of NVIDIA GPUs but also deeply adapts to domestic chips such as Huawei Ascend, Moore Threads, Muxi, and Hygon, enabling full-scenario deployment from single-card to cluster.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-01T04:14:27.000Z
- 最近活动: 2026-04-01T04:17:44.144Z
- 热度: 158.9
- 关键词: Chitu, 赤兔, 大模型推理, 清华PACMAN, 国产芯片, 昇腾, 摩尔线程, 沐曦, DeepSeek, Qwen, 量化推理, 生产级部署
- 页面链接: https://www.zingnex.cn/en/forum/thread/chitu
- Canonical: https://www.zingnex.cn/forum/thread/chitu
- Markdown 来源: floors_fallback

---

## Introduction: Tsinghua Open-Sources Chitu Inference Engine, Fully Supporting Domestic Chips and Full-Scenario Deployment

The Chitu (赤兔) inference framework, open-sourced by Tsinghua University's PACMAN Lab, is positioned as a production-grade large model inference engine with both high performance and stability. Its core advantages include: supporting the full range of NVIDIA GPUs and domestic chips such as Huawei Ascend, Moore Threads, Muxi, and Hygon; covering full-scenario deployment from pure CPU, single-card GPU to large-scale clusters; being compatible with mainstream large models like DeepSeek, Qwen, and GLM; and having technical highlights such as FP4/FP8 quantization and CPU+GPU heterogeneous hybrid inference, which can handle real concurrent business traffic.

## Project Background and Positioning

The Chinese name of Chitu ('赤兔') implies speed and power. Its design goal is to build an efficient, flexible, and usable high-performance inference framework. Unlike engines optimized for a single hardware, it considers the progressive needs of enterprise AI implementation from the very beginning of design, providing scalable solutions from laboratory experiments to large-scale production. It is clearly positioned as 'production-grade', not only pursuing extreme performance but also ensuring long-term operational stability and reliability, capable of handling real concurrent business traffic.

## Multi-Computing Power Adaptation: Deep Support for Domestic Chips

Chitu's comprehensive support for multi-computing power is one of its core features:
- **Full range of NVIDIA**: Covers products from Blackwell architecture to older series;
- **Huawei Ascend**: v0.3.5 supports native deployment of Ascend 910B, v0.3.9 first launched GLM-4.5 MoE model inference on Ascend;
- **Moore Threads**: Adaptation completed in v0.5.1;
- **Muxi, Hygon**: Performance and stability improved in v0.4.0.
This allows enterprises to flexibly choose computing power platforms and avoid single-vendor lock-in.

## Full-Scenario Scalable Deployment Solutions

Chitu supports full-scenario deployment:
- **Pure CPU deployment**: Reduces hardware thresholds, suitable for lightweight inference scenarios;
- **Single-card GPU deployment**: Through CPU+GPU heterogeneous hybrid inference (v0.2.2), a single card can run the DeepSeek-R1 671B super-large model; v0.3.0 added FP4 online conversion to FP8/BF16 operators, supporting the FP4 quantized version of this model;
- **Large-scale cluster deployment**: v0.5.0 improved cluster performance to meet enterprises' high concurrency needs.

## Model Ecosystem and Core Technical Highlights

**Model Ecosystem**: Supports mainstream large models like DeepSeek, Qwen, GLM, and Kimi; v0.3.5 provides high-performance solutions for the Qwen3 series; v0.3.9 first launched GLM-4.5 MoE deployment on Ascend;
**Technical Highlights**:
1. **Quantization Support**: v0.1.0 supports FP8 to BF16 conversion; v0.3.0 added FP4 to FP8/BF16 conversion, reducing memory and computing overhead;
2. **Heterogeneous Hybrid Inference**: Intelligently distributes CPU/GPU tasks, enabling single-card operation of super-large models;
3. **Production-Grade Stability**: Emphasizes long-term stable operation and adapts to real business scenarios.

## Rapid Deployment and Open-Source Ecosystem

**Rapid Deployment**: Provides multi-platform Docker images, such as NVIDIA (arch8.0/8.9, 9.0), Muxi, Ascend (A2/A3), etc., lowering the entry threshold;
**Open-Source Ecosystem**: Adopts the Apache License v2.0 protocol, with code hosted on GitHub. The team actively draws inspiration from projects like DeepSeek and FlashAttention, and welcomes community contributions while providing detailed guidelines.

## Application Value and Future Outlook

Chitu's value to enterprises: Adaptation to domestic chips has strategic significance, and production-grade stability reduces technical risks;
Outlook: As large model scenarios expand, the importance of inference engines becomes prominent, and Chitu is expected to play a key role in the domestic ecosystem;
Suggestion: Teams that need to reduce inference costs, improve performance, or deploy large models on domestic chips can evaluate and try Chitu.