Zing Forum

Reading

DeepSparkInference: A Comprehensive Analysis of the Open-Source Library with 216 AI Inference Models on Domestic GPUs

DeepSparkInference is a core project of the DeepSpark open-source community, offering 216 inference model examples running on domestic Iluvatar CoreX GPUs. It covers multiple domains including CV, NLP, speech synthesis, and large language models, supports mainstream inference frameworks like vLLM, TGI, and LMDeploy, and provides crucial support for the ecosystem development of domestic AI chips.

国产GPU天数智芯AI推理大语言模型vLLM开源DeepSpark模型库国产芯片
Published 2026-04-23 23:14Recent activity 2026-04-23 23:55Estimated read 6 min
DeepSparkInference: A Comprehensive Analysis of the Open-Source Library with 216 AI Inference Models on Domestic GPUs
1

Section 01

DeepSparkInference Project Guide

DeepSparkInference Project Guide

DeepSparkInference is a core project of the DeepSpark open-source community, offering 216 inference model examples running on domestic Iluvatar CoreX GPUs. It covers multiple domains including CV, NLP, speech synthesis, and large language models, supports mainstream inference frameworks like vLLM, TGI, and LMDeploy, and provides crucial support for the ecosystem development of domestic AI chips.

2

Section 02

Project Background and Significance

Project Background and Significance

In the development of artificial intelligence, hardware support for model inference is a key constraint. For a long time, the high-end AI chip market has been monopolized by foreign companies, and domestic GPUs have shortcomings in software ecosystem and model support. DeepSparkInference was open-sourced in March 2024 to fill this gap, providing abundant model inference examples and a complete toolchain, injecting momentum into the domestic AI chip ecosystem.

3

Section 03

Technical Architecture and Core Engines

Technical Architecture and Core Engines

The project revolves around two inference engines from Iluvatar CoreX:

  • IGIE: A high-performance inference engine based on TVM, supporting multi-framework import, INT8 quantization, graph optimization, multi-operator library and backend adaptation, operator auto-tuning, etc., suitable for production environment deployment.
  • ixRT: A self-developed high-performance engine focused on unleashing the performance of Iluvatar CoreX GPUs, supporting dynamic shape inference, plugin mechanism, mixed-precision computation, suitable for scenarios with strict requirements on latency and throughput.
4

Section 04

Model Coverage and Classification

Model Coverage and Classification

The 216 models are categorized by domain:

  • Computer Vision: Includes ResNet, YOLO, etc., covering tasks like image classification and object detection, supporting scenarios such as security and industrial quality inspection.
  • Natural Language Processing: Includes BERT, GPT series, covering tasks like text classification, with special optimizations for Chinese models.
  • Speech Recognition and Synthesis: Such as CosyVoice2-0.5B, supporting scenarios like intelligent customer service.
  • Large Language Models: Supports series like Baichuan, ChatGLM, DeepSeek, Llama, Qwen, enabling efficient inference via mainstream frameworks.
  • Multimodal Models: Such as Qwen-VL, GLM-4V, etc., meeting complex scenarios like image-text understanding.
5

Section 05

Community Activities and Practical Application Value

Community Activities and Practical Application Value

  • Community Activities: Co-hosted a hackathon with Baidu PaddlePaddle from March to June 2025, setting up check-in, advanced, and open-source contribution tracks to lower participation barriers.
  • Application Value:
  1. Reduces the threshold for enterprise AI deployment, providing verified models and deployment documents.
  2. Supports the construction of independent and controllable domestic computing infrastructure.
  3. Promotes industry-university-research collaborative innovation and accelerates the transformation of research results.
6

Section 06

Future Outlook and Conclusion

Future Outlook and Conclusion

Future Plans

  1. Expand the model library to more sub-fields; 2. Deepen support for large models and multimodal models; 3. Optimize inference performance; 4. Improve the toolchain; 5. Strengthen community building.

Conclusion

This project is a milestone for domestic GPUs moving from "usable" to "easy to use", providing an evaluation window for AI developers, offering enterprises an independent and controllable computing power option, and promoting the progress of the domestic AI industry.