Reading

SIS-LLM: A Unified Framework for Evaluating the Sustainability of Large Language Model Inference

SIS-LLM is a unified framework for evaluating the sustainability of large language model (LLM) inference. It integrates performance, efficiency, and environmental metrics to generate a single interpretable Sustainability Index Score (SIS).

LLMsustainabilityenergy efficiencycarbon emissionsinference optimizationgreen AISISQwenMistralLLaMA

Published 2026-06-16 06:46Recent activity 2026-06-16 06:49Estimated read 7 min

SIS-LLM: A Unified Framework for Evaluating the Sustainability of Large Language Model Inference

Section 01

SIS-LLM: A Unified Framework for LLM Inference Sustainability Evaluation

SIS-LLM is a unified framework for evaluating the sustainability of large language model (LLM) inference, developed by Urooj Asgher (Technological University Dublin) and released on GitHub (project name: SIS-LLM-InferenceTool) on June 15, 2026. It integrates performance, efficiency, and environmental metrics into a single interpretable Sustainability Index Score (SIS), helping developers and enterprises make informed decisions in model selection.

Section 02

Background & Motivation

With LLMs widely used across industries, energy consumption and environmental impact during inference are increasingly concerning. Current evaluations focus on accuracy and speed but ignore sustainability metrics like energy efficiency and carbon emissions. This single-dimensional approach fails to reflect real deployment costs or guide green AI development. SIS-LLM addresses this gap by unifying multiple metrics into an SIS score.

Section 03

Core Concept: SIS Score & Key Metrics

SIS Score Definition

SIS (Sustainability Index Score) is a 0-1 score where lower values indicate better sustainability.

SIS Rating Levels

SIS Range	Sustainability Level
0.0-0.3	Low Impact
0.3-0.7	Medium Impact
0.7-1.0	High Impact

Key Metrics

Energy & Environment: Energy consumption (J/query), carbon emissions (g CO₂eq/query), token energy efficiency (tokens/J)
Performance: Execution time (s/query), throughput (tokens/s), accuracy (benchmark performance)
Resource Efficiency: Model efficiency (accuracy/energy), hardware efficiency (accuracy/CPU hours), memory usage (GB), FLOPs (operations/inference), model size (MB)

Section 04

Evaluation Setup

Evaluated Models

Model Name	Parameters	Quantization
Qwen2.5-7B-Instruct	7B	GGUF Q4_K_M
Mistral-7B-Instruct-v0.3	7B	GGUF Q4_K_M
Meta-Llama-3.1-8B-Instruct	8B	GGUF Q4_K_M
Phi-3.5-mini-Instruct	3.8B	GGUF Q4_K_M

Datasets

GSM8K (500 samples, math reasoning)
MMLU (500 samples, multi-disciplinary knowledge)
TruthfulQA (500 samples, factual accuracy) All tests use seed=42 for reproducibility.

Hardware & Software

Hardware: 2× Intel Xeon Gold 6430 (64 cores/128 threads), CPU-only (GPU disabled), Adcewatt power meter for real energy measurement.
Software: llama.cpp framework, core scripts (main runner, dataset builder, power monitoring, metric collection).

Section 05

Practical Application Value

Developers: Objective model selection tool (consider sustainability alongside performance), especially useful for edge/resource-limited environments.
Enterprises: Reduce operational costs (lower energy use), fulfill ESG responsibilities (quantify carbon footprint), optimize resource allocation.
Research: Standardized evaluation framework, open-source toolchain, and benchmark dataset for reproducible sustainability research.

Section 06

Usage & Deployment Guide

Clone Repository: git clone https://github.com/urooj88/SIS-LLM-InferenceTool.git && cd SIS-LLM-InferenceTool
Install Dependencies: pip install -r requirements.txt
Build Dataset: python3 build_eval_dataset.py --reason 500 --mcq 500 --truth 500 --force-rebuild
Run Evaluation: python3 main_sustainability_runner_LLM_CPU.py

Required Models

Download GGUF models from HuggingFace: Qwen2.5-7B-Instruct-GGUF, Mistral-7B-Instruct-v0.3-GGUF, Meta-Llama-3.1-8B-Instruct-GGUF, Phi-3.5-mini-instruct-GGUF.

Section 07

Limitations & Future Work

Limitations

Hardware dependency: Requires Adcewatt power meter for real energy measurement.
CPU-only: GPU inference evaluation is under development.
Limited model coverage: Only 4 7B-level models evaluated.

Future Directions

Extend to GPU inference evaluation.
Support more model architectures and quantization schemes.
Develop cloud deployment energy estimation models.
Establish industry-standard SIS benchmark database.

Section 08

Conclusion & Insights

SIS-LLM pioneers a unified approach to LLM inference sustainability evaluation. By integrating performance, efficiency, and environmental metrics into an interpretable score, it helps balance model performance with sustainability. This framework emphasizes that sustainability should be a core consideration in model design and selection, paving the way for greener AI systems.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23