Section 01
[Introduction] SCIN: Switch-Centric In-Network Computing Architecture Accelerates Large Model Inference
This paper proposes SCIN (Switch-Centric In-Network Computing Architecture) to address the communication bottleneck in distributed inference of large models. Its core innovation is making the switch an active computing initiator—by integrating in-switch accelerators (ISA), it eliminates the data return overhead of NVLink Sharp and supports in-network quantization. Experiments show that SCIN achieves a 1.74x improvement in TTFT, a 1.34x improvement in TPOT on the LLaMA-2 model, and up to 8.7x acceleration for All-Reduce operations.