Section 01
Introduction: EnergyLens—An Energy Optimization Framework for Multi-GPU Large Model Inference
EnergyLens is an end-to-end energy-aware optimization framework designed for multi-GPU large language model inference. It achieves energy prediction in the configuration space and Pareto optimal selection via the einsum interface and multi-GPU communication energy model, with a prediction error of 9.25%-13.19% on Llama3 and Qwen3-MoE models. It aims to address the pain points of existing energy optimization tools.