Zing Forum

Reading

From Computing Power Competition to Energy Efficiency: A New Paradigm for Large Model Inference Evaluation

Researchers propose that LLM inference should be viewed as an "energy-to-token production" process, introducing the Token Production Function framework. They call on the industry to report energy metrics such as joules per token and PUE-adjusted power in addition to accuracy when evaluating inference systems, to promote the sustainable development of AI.

LLM推理能源效率Token生产函数PUE可持续发展绿色AI能源到令牌大模型部署
Published 2026-05-12 16:15Recent activity 2026-05-13 11:49Estimated read 6 min
From Computing Power Competition to Energy Efficiency: A New Paradigm for Large Model Inference Evaluation
1

Section 01

[Introduction] New Paradigm for Large Model Inference Evaluation: Shifting from Computing Power Competition to Energy Efficiency

Researchers propose that LLM inference should be viewed as an "energy-to-token production" process, introducing the Token Production Function framework. They call on the industry to report energy metrics such as joules per token and PUE-adjusted power in addition to accuracy when evaluating inference systems, to promote the sustainable development of AI.

2

Section 02

Limitations of Current LLM Inference Evaluation Systems

The evaluation of large language model inference performance has long focused on accuracy, latency, throughput, and hardware utilization. However, with the large-scale deployment of LLMs, these metrics have revealed limitations: in real-world production, the core output is tokens of a specific quality, constrained by physical factors such as effective computing power, power supply capacity, cooling capacity, PUE, and utilization. Thus, inference has become an energy production issue.

3

Section 03

Energy-to-Token Paradigm and Token Production Function Framework

The new paradigm views inference as "energy-to-token production" and introduces the Token Production Function framework: the token generation rate is constrained by two upper limits—per-token computing power limit (determined by model architecture, parameter scale, and hardware computing power) and per-token energy limit (determined by data center power supply, cooling efficiency, and PUE). It is necessary to identify the "active constraint" of the current system to formulate optimization strategies.

4

Section 04

System Optimization: Key Levers to Improve Energy Efficiency

Various system optimization technologies can serve as energy-to-token levers: KV cache compression reduces memory bandwidth requirements and lowers energy consumption; sparse and compressed attention reduces per-token FLOPs and memory traffic; quantization techniques reduce computation energy consumption; routing and mixture of experts allocate computing power on demand; difficulty-adaptive inference dynamically adjusts inference depth to avoid waste.

5

Section 05

Call for Establishing New Energy-Related Evaluation Reporting Standards

The paper calls on inference research and benchmarking to report the following metrics: joules per token (core energy efficiency metric), active constraints (to clarify system bottlenecks), PUE-adjusted actual power (considering data center energy efficiency), and utilization-adjusted token output (effective production capacity).

6

Section 06

Profound Significance of the New Paradigm for AI Sustainable Development

Environmental perspective: High energy consumption increases carbon footprint, requiring responses to climate change; Economic perspective: Energy costs have become the main operating cost of LLM services, and improving efficiency is key to business competitiveness; Technical perspective: Energy constraints drive the exploration of more efficient architectures and algorithms.

7

Section 07

Practical Recommendations for Energy Efficiency in Enterprise LLM Service Deployment

Recommendations for enterprises deploying LLM services: Establish an energy baseline (measure current Joules/token metrics), identify active constraints (analyze computing power or energy bottlenecks), prioritize investment in energy levers (targeted optimization technologies), and continuously monitor and optimize (incorporate energy metrics into regular processes).

8

Section 08

Conclusion: Paradigm Shift Drives Green AI Development

The shift from "computing power to tokens" to "energy to tokens" is a change in mindset. LLM inference is constrained by physical laws. In the phase of large-scale AI deployment, energy efficiency is key to technical feasibility and commercial sustainability. We look forward to the industry adopting the new paradigm to promote green and responsible AI development.