Section 01
FusionCIM: Fusion-Driven In-Memory Computing Architecture Accelerates Large Model Inference (Introduction)
FusionCIM Introduction
FusionCIM is a fusion-driven in-memory computing (CIM) architecture. To address the challenges of applying CIM to large model inference, it proposes three key innovations: hybrid CIM pipeline, QO stationary dataflow, and pattern-aware online softmax. On LLaMA-3, it achieves a 3.86x energy efficiency improvement and a system-level energy efficiency of 29.4 TOPS/W, providing a reference for AI accelerator design.