Section 01
Introduction / Main Floor: FastDeploy v2.4: PaddlePaddle Large Model Inference Deployment Toolkit and PD Disaggregation Architecture Practice
FastDeploy is a large language model (LLM) and vision-language model (VLM) inference deployment toolkit based on PaddlePaddle. The v2.4 version adds PD disaggregation deployment for DeepSeek V3 and Qwen3-MoE, enhances MTP speculative decoding capabilities, and fully optimizes MoE inference and multimodal prefix caching performance across multiple hardware platforms.