# GAR: A Carbon-Aware Routing Optimization Framework for LLM Inference

> Google Research team proposes the GAR framework, which integrates carbon emissions into LLM inference routing decisions. It achieves significant carbon reduction while maintaining accuracy and latency SLAs, providing a theoretical foundation and practical solutions for green AI inference.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-12T06:32:31.000Z
- 最近活动: 2026-05-13T02:24:31.573Z
- 热度: 127.1
- 关键词: 绿色AI, LLM推理, 碳感知路由, 多目标优化, 可持续发展, 模型路由
- 页面链接: https://www.zingnex.cn/en/forum/thread/gar-llm
- Canonical: https://www.zingnex.cn/forum/thread/gar-llm
- Markdown 来源: floors_fallback

---

## [Introduction] GAR: Core Introduction to the Carbon-Aware Routing Optimization Framework for LLM Inference

Google Research team proposes the GAR (Green-Aware Routing) framework, which incorporates carbon emissions into LLM inference routing decisions. It minimizes the CO₂ emissions per request while meeting the minimum accuracy threshold and p95 latency Service Level Objective (SLO), providing a theoretical foundation and practical solutions for green AI inference.

## Background: Energy Consumption and Carbon Emission Challenges of LLM Inference

The deployment scale of Large Language Models (LLMs) is growing rapidly. Existing routing methods mostly balance response quality and computing costs, but rarely take sustainable energy use and CO₂ emissions as optimization objectives—even though grid carbon intensity varies by time and region, and energy consumption differs significantly between models. With the explosive growth of AI inference demand, its carbon footprint is accumulating rapidly, posing severe environmental challenges.

## Core Design of the GAR Framework: Adaptive Constraints and Lightweight Estimators

GAR is a constrained multi-objective optimization framework, with the core goal of minimizing carbon emissions while meeting the minimum accuracy threshold and p95 latency SLO. Its key innovations include:
1. Adaptive constraint optimization: Adjust the minimum accuracy threshold for each dataset to dynamically adapt to task requirements;
2. Lightweight estimator: Integrates correctness, tail latency, and carbon emission estimation, supporting real-time routing decisions without additional inference overhead;
3. Online primal-dual algorithm (GAR-PD): Designed specifically for rolling carbon budget scenarios to dynamically and efficiently allocate resources.

## Technical Implementation: Multi-Objective Constrained Optimization and Heuristic Variants

GAR models the routing problem as a constrained multi-objective optimization problem, considering three dimensions simultaneously:
1. Carbon emission minimization: Prioritize models and regions with lower carbon intensity;
2. Accuracy guarantee: Ensure response quality does not fall below the preset threshold;
3. Latency constraint: Meet the p95 latency SLO requirements.
In addition, the research team developed heuristic variants: Strict Mode (prioritize accuracy and latency), Balanced Mode (balance all three), and Green Mode (prioritize carbon emission minimization), providing flexible options for different scenarios.

## Experimental Validation: GAR Achieves Significant Carbon Reduction While Maintaining Service Quality

The research team evaluated GAR on standard NLP benchmarks using a heterogeneous LLM pool (7B-70B parameter scale). The results show:
1. Carbon reduction: Achieves considerable CO₂ reduction compared to traditional routing strategies;
2. Accuracy: Meets the minimum accuracy threshold, with performance loss controlled within an acceptable range;
3. Latency: Reliably meets the p95 latency SLO;
4. Scalability: Performs well across model pools from 7B to 70B parameters, with strong generalization ability.

## Practical Deployment Value: Significance for Cloud Service Providers, Enterprises, and the Industry

The practical deployment value of the GAR framework is reflected in multiple aspects:
- Cloud service providers: Helps meet environmental regulations and ESG requirements, reduces data center carbon footprint and energy costs, and enhances green brand image;
- Enterprise users: Enables sustainable AI deployment without affecting service quality, meets internal carbon neutrality goals, and optimizes inference costs (green energy is usually cheaper);
- AI industry: Drives the industry toward a more sustainable direction, provides references for green AI standard formulation, and promotes the popularization of carbon-aware AI infrastructure.

## Limitations and Future Directions: Data Dependencies and Extended Scenarios

GAR has the following limitations and future exploration directions:
1. Real-time carbon data dependency: Effectiveness relies on accurate real-time grid carbon intensity data; data quality affects optimization results;
2. Model energy consumption modeling: Currently based on offline measured model energy consumption data; future exploration can include online energy consumption estimation;
3. Multi-tenant scenarios: Fair allocation of carbon budgets in shared infrastructure requires further research;
4. Edge deployment: Extend to edge computing scenarios, considering device-level energy consumption and on-site use of renewable energy.
