Section 01
[Introduction] Exploring Decoupling Limits in MoE Large Model Inference: A Study on the Design Space of Attention-FFN Disaggregation
The original author team (arXiv submission) published the paper titled "How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving" on arXiv on May 27, 2026 (link: http://arxiv.org/abs/2605.28302v1). Through systematic design space exploration, this study analyzes the benefit boundaries of various decoupling strategies (from Chunked-Prefill to Prefill-Decode and then to Attention-FFN Disaggregation (AFD)) in MoE model serving, providing practical guidance for the design of large-scale inference infrastructure. Key findings include: AFD has significant advantages under strict latency constraints, and its benefits depend on the matching of workload characteristics, resource allocation, and interconnection topology.