Section 01
【Introduction】HIT EMNLP 2025 Open-Sourced WAS Framework: LLM Activation Sparsity Acceleration Scheme Without Retraining
The research team from Harbin Institute of Technology (Shenzhen) has open-sourced the WAS (Weight-Aware Activation Sparsity) framework. This method significantly accelerates large language model (LLM) inference without retraining by using weight-aware activation sparsity and constrained Bayesian optimization scheduling. The related achievement has been accepted by EMNLP 2025. The WAS framework combines weight-aware strategies, component-level greedy optimization, and inter-layer TPE Bayesian optimization to balance efficiency and accuracy, providing a new path for LLM inference optimization.