# SIS-LLM: A Unified Framework for Evaluating the Sustainability of Large Language Model Inference

> SIS-LLM is a unified framework for evaluating the sustainability of large language model (LLM) inference. It integrates performance, efficiency, and environmental metrics to generate a single interpretable Sustainability Index Score (SIS).

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-15T22:46:15.000Z
- 最近活动: 2026-06-15T22:49:12.549Z
- 热度: 165.9
- 关键词: LLM, sustainability, energy efficiency, carbon emissions, inference optimization, green AI, SIS, Qwen, Mistral, LLaMA, Phi
- 页面链接: https://www.zingnex.cn/en/forum/thread/sis-llm
- Canonical: https://www.zingnex.cn/forum/thread/sis-llm
- Markdown 来源: floors_fallback

---

## SIS-LLM: A Unified Framework for LLM Inference Sustainability Evaluation

SIS-LLM is a unified framework for evaluating the sustainability of large language model (LLM) inference, developed by Urooj Asgher (Technological University Dublin) and released on GitHub (project name: SIS-LLM-InferenceTool) on June 15, 2026. It integrates performance, efficiency, and environmental metrics into a single interpretable Sustainability Index Score (SIS), helping developers and enterprises make informed decisions in model selection.

## Background & Motivation

With LLMs widely used across industries, energy consumption and environmental impact during inference are increasingly concerning. Current evaluations focus on accuracy and speed but ignore sustainability metrics like energy efficiency and carbon emissions. This single-dimensional approach fails to reflect real deployment costs or guide green AI development. SIS-LLM addresses this gap by unifying multiple metrics into an SIS score.

## Core Concept: SIS Score & Key Metrics

### SIS Score Definition
SIS (Sustainability Index Score) is a 0-1 score where lower values indicate better sustainability.

### SIS Rating Levels
| SIS Range | Sustainability Level |
|-----------|----------------------|
| 0.0-0.3   | Low Impact           |
| 0.3-0.7   | Medium Impact        |
| 0.7-1.0   | High Impact          |

### Key Metrics
- **Energy & Environment**: Energy consumption (J/query), carbon emissions (g CO₂eq/query), token energy efficiency (tokens/J)
- **Performance**: Execution time (s/query), throughput (tokens/s), accuracy (benchmark performance)
- **Resource Efficiency**: Model efficiency (accuracy/energy), hardware efficiency (accuracy/CPU hours), memory usage (GB), FLOPs (operations/inference), model size (MB)

## Evaluation Setup

### Evaluated Models
| Model Name | Parameters | Quantization |
|------------|------------|--------------|
| Qwen2.5-7B-Instruct | 7B | GGUF Q4_K_M |
| Mistral-7B-Instruct-v0.3 |7B | GGUF Q4_K_M |
| Meta-Llama-3.1-8B-Instruct |8B | GGUF Q4_K_M |
| Phi-3.5-mini-Instruct |3.8B | GGUF Q4_K_M |

### Datasets
- GSM8K (500 samples, math reasoning)
- MMLU (500 samples, multi-disciplinary knowledge)
- TruthfulQA (500 samples, factual accuracy)
All tests use seed=42 for reproducibility.

### Hardware & Software
- **Hardware**: 2× Intel Xeon Gold 6430 (64 cores/128 threads), CPU-only (GPU disabled), Adcewatt power meter for real energy measurement.
- **Software**: llama.cpp framework, core scripts (main runner, dataset builder, power monitoring, metric collection).

## Practical Application Value

- **Developers**: Objective model selection tool (consider sustainability alongside performance), especially useful for edge/resource-limited environments.
- **Enterprises**: Reduce operational costs (lower energy use), fulfill ESG responsibilities (quantify carbon footprint), optimize resource allocation.
- **Research**: Standardized evaluation framework, open-source toolchain, and benchmark dataset for reproducible sustainability research.

## Usage & Deployment Guide

1. **Clone Repository**: `git clone https://github.com/urooj88/SIS-LLM-InferenceTool.git && cd SIS-LLM-InferenceTool`
2. **Install Dependencies**: `pip install -r requirements.txt`
3. **Build Dataset**: `python3 build_eval_dataset.py --reason 500 --mcq 500 --truth 500 --force-rebuild`
4. **Run Evaluation**: `python3 main_sustainability_runner_LLM_CPU.py`

### Required Models
Download GGUF models from HuggingFace: Qwen2.5-7B-Instruct-GGUF, Mistral-7B-Instruct-v0.3-GGUF, Meta-Llama-3.1-8B-Instruct-GGUF, Phi-3.5-mini-instruct-GGUF.

## Limitations & Future Work

### Limitations
- Hardware dependency: Requires Adcewatt power meter for real energy measurement.
- CPU-only: GPU inference evaluation is under development.
- Limited model coverage: Only 4 7B-level models evaluated.

### Future Directions
- Extend to GPU inference evaluation.
- Support more model architectures and quantization schemes.
- Develop cloud deployment energy estimation models.
- Establish industry-standard SIS benchmark database.

## Conclusion & Insights

SIS-LLM pioneers a unified approach to LLM inference sustainability evaluation. By integrating performance, efficiency, and environmental metrics into an interpretable score, it helps balance model performance with sustainability. This framework emphasizes that sustainability should be a core consideration in model design and selection, paving the way for greener AI systems.