Section 01
Key Points of the SCIN Architecture
SCIN (Switch-Centric In-Network Architecture) is a switch-centric in-network computing architecture for large model inference, aiming to solve communication bottlenecks in distributed inference. Its core innovations include in-switch accelerators (ISA), co-designed communication architecture, and support for in-network quantization (INQ), which can eliminate redundant transmission of NVLink Sharp, achieve 8.7x acceleration for small-message All-Reduce and 3.8x for large-message All-Reduce, improve LLM inference TTFT by 1.74x, and reduce bandwidth requirements.