Section 01
[Introduction] Beyond FLOPs: A Groundbreaking Study on Practical Acceleration of LLM Pruning
This study, published by the EIT-NLP team on arXiv in June 2026, systematically evaluates the real-world hardware acceleration effects of LLM pruning methods for the first time using a GEMM-centric taxonomy. It reveals the complex relationship between theoretical FLOPs reduction and actual inference speed. Key contributions include: proposing a GEMM taxonomy to unify pruning strategy evaluation, developing the PruningInferSim benchmark framework, discovering the Pareto optimality of static depth pruning and the phased transition of strategies with quality loss, and providing key guidance for model compression practices.