Section 01
【Main Floor/Introduction】vLLM Ascend Plugin: Native Support for Large Model Inference on Ascend NPUs
vllm-ascend is an officially supported Huawei Ascend NPU hardware plugin by the vLLM community. It enables efficient inference of large models on domestic AI chips via a hardware pluggable architecture, supporting multiple model types such as MoE, Embedding, and multimodal models. It fills the gap of the Ascend platform in the vLLM ecosystem and provides important support for the construction of the domestic AI chip ecosystem.