# SCIN: Switch-Centric In-Network Computing Architecture for Large Model Inference

> SCIN eliminates redundant data transmission of NVLink Sharp through in-switch accelerators (ISA) and a co-designed communication architecture, achieving 8.7x acceleration for small-message All-Reduce and 3.8x for large-message All-Reduce, a 1.74x improvement in TTFT, and supporting in-network quantization (INQ) to reduce bandwidth requirements.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-30T09:59:11.000Z
- 最近活动: 2026-03-31T03:28:47.776Z
- 热度: 131.5
- 关键词: in-network computing, All-Reduce, switch-centric, LLM inference, quantization, NVLink, distributed training
- 页面链接: https://www.zingnex.cn/en/forum/thread/scin
- Canonical: https://www.zingnex.cn/forum/thread/scin
- Markdown 来源: floors_fallback

---

## Key Points of the SCIN Architecture

SCIN (Switch-Centric In-Network Architecture) is a switch-centric in-network computing architecture for large model inference, aiming to solve communication bottlenecks in distributed inference. Its core innovations include in-switch accelerators (ISA), co-designed communication architecture, and support for in-network quantization (INQ), which can eliminate redundant transmission of NVLink Sharp, achieve 8.7x acceleration for small-message All-Reduce and 3.8x for large-message All-Reduce, improve LLM inference TTFT by 1.74x, and reduce bandwidth requirements.

## Communication Bottlenecks in Large Model Inference and Limitations of Existing Technologies

Large-scale deployment of large model inference faces communication overhead challenges, and All-Reduce operations in distributed systems often become performance bottlenecks. Although the existing NVLink Sharp technology offloads All-Reduce to switches, it has two major limitations: first, it relies on GPUs to trigger reduction, leading to redundant transmission as reduced data needs to be sent back to the source GPU before broadcasting; second, it cannot support non-memory semantic operations (such as INQ), requiring operation in FP16/BF16 precision, resulting in bandwidth waste.

## Design of SCIN's Switch-Centric Architecture

SCIN proposes a switch-centric paradigm, upgrading switches from passive forwarding nodes to active computing participants. Key innovations include: 1. In-switch Accelerator (ISA): actively initiates memory operations, directly broadcasts reduction results to target nodes, eliminating redundancy; 2. Co-designed Communication Architecture: sinks synchronization logic at the hardware layer to reduce software overhead; 3. INQ Support: ISA integrates a quantization module, reducing precision to 8 bits, lowering bandwidth requirements with negligible precision loss.

## SCIN Performance Optimization Mechanisms

SCIN optimizes performance through two major mechanisms: 1. Eliminating redundant transmission: adopts a single-hop mode, directly broadcasting results from the switch after reduction, reducing communication steps from 3 to 2, lowering latency; 2. Improving bandwidth efficiency: INQ reduces precision to 8 bits, halving bandwidth requirements with negligible precision loss, suitable for large model parameter synchronization scenarios.

## SCIN Experimental Validation and Performance Results

The research team implemented an SCIN prototype on a multi-FPGA system. Experimental results show: 8.7x acceleration for small-message All-Reduce and 3.8x for large-message All-Reduce; in end-to-end evaluation of the LLaMA-2 model, TTFT (Time To First Token) improved by 1.74x, and TPOT (Time Per Output Token) improved by 1.34x.

## Technical Significance and Industry Impact of SCIN

SCIN promotes the transformation of network computing architecture from endpoint-centric to switch-centric, making switches active computing nodes. Industry implications include: 1. Programmable Networks: switches can integrate general computing capabilities; 2. Precision-Adaptive Transmission: network protocols natively support multi-precision; 3. Hardware-Software Co-design: optimize all layers for AI workloads.

## Limitations and Future Directions of SCIN

Current limitations: limited performance of FPGA prototypes, ecosystem compatibility challenges, and the generality of quantization strategies to be solved. Future directions: extend to complex in-network operations (such as All-Gather), dynamic precision adjustment, and combine optical network technology to further improve performance.
