Section 01
RH+ Scheduling: A New Breakthrough in Row-Hit Optimization for LLM Inference on PIM Architectures (Introduction)
This article reveals that the real bottleneck of LLM inference on PIM architectures is DRAM row cycle time (nRC) rather than the previously assumed nCCDAB. It proposes the RH+ scheduling strategy, which enables 32 consecutive MAC operations to be executed within the same row via simple step adjustment. This results in an 8-12x speedup, over 74% energy reduction, and a 52x improvement in EDP (Energy Delay Product), while being compatible with existing HBM3 specifications without requiring hardware modifications.