Reading

ebb-ai: Intelligently Scheduling Agent Workloads to Make LLM Inference Cheaper and Greener

An open-source AI workload scheduling system that delays non-urgent LLM tasks to low-grid-load periods, achieving about 50% cost savings while reducing carbon emissions and providing auditable carbon footprint receipts.

AI调度LLM推理碳足迹Batch APIMCPAgentic AI成本优化绿色计算

Published 2026-05-15 07:42Recent activity 2026-05-15 07:48Estimated read 6 min

Section 01

Introduction / Main Floor: ebb-ai: Intelligently Scheduling Agent Workloads to Make LLM Inference Cheaper and Greener

Section 02

Background: AI Inference Is Devouring the Power Grid

According to the U.S. Department of Energy's projections, by 2028, AI inference will consume 6.7% to 12% of the U.S. grid's power load. This is a shocking number—data center electricity usage has doubled since 2020, and this trend is accelerating as agentic AI workloads scale up.

However, current agent code triggers LLM calls synchronously by default, even if these tasks can be delayed entirely. For example, tasks like "Summarize my inbox tonight" or "Rewrite these 5000 product descriptions by Friday" don't need immediate responses, yet they still consume valuable computing resources during peak grid hours.

This status quo brings three core problems: high costs, enormous grid pressure, and uncontrollable carbon emissions.

Section 03

What Is ebb-ai?

ebb-ai is a workload scheduling system designed specifically for the agentic AI economy. Its core idea is simple: automatically delay non-urgent LLM tasks to periods of low grid load and cheap electricity prices, while generating auditable cost and carbon emission receipts.

This project was developed by Vitalii Borovyk and is open-sourced under the Apache 2.0 license. It is not just a cost optimization tool, but also an innovative attempt to integrate AI infrastructure with sustainable energy management.

Section 04

1. Cost Optimization: Automated Use of Batch APIs

Both Anthropic and OpenAI offer Batch APIs that allow delayed tasks to enjoy a fixed 50% discount. However, the problem is that almost no agent code uses them because this choice has to be made manually at the call point.

ebb-ai automatically makes this decision via the defer() API and an intelligent scheduler. When the system detects that a task can wait, it automatically routes it to the Batch API path—users don't need to modify their existing code logic.

Section 05

2. Grid Load Smoothing: Time-Shifting Strategy

AI computing in data centers is concentrated in a few U.S. regions—PJM Mid-Atlantic/Virginia, ERCOT Texas, and CAISO California. AI workloads during peak hours compete with hospitals, industrial users, and residents for already strained power capacity. Virginia regulators have listed data center load growth as a Level 1 reliability concern.

ebb-ai effectively reduces the peak load that the grid needs to plan for by shifting deferrable workloads to off-peak hours. This time-shifting strategy not only benefits grid stability but also brings tangible economic benefits to users.

Section 06

3. Carbon Footprint Tracking: Auditable Green Receipts

Grid carbon intensity can fluctuate by 30% to 60% in a single day. The same scheduling decision that saves costs and smooths loads also reduces CO₂ emissions.

ebb-ai generates an auditable receipt for each scheduling event, including cost, carbon emissions, provider, and execution duration. This is of great value for ESG reporting, cost accounting, and upcoming computing disclosure regulations.

Section 07

Technical Architecture: MCP-Native Design

ebb-ai is designed natively with the Model Context Protocol (MCP) and can be seamlessly integrated as a plugin into various agent host environments:

Claude Desktop
Claude Code
Cursor
Cline
Continue
Zed
Windsurf
OpenClaw
OpenAI Codex CLI

The system architecture includes the following core components:

Section 08

Core Library (@ebb-ai/core)

The TypeScript core library provides the defer() API, AnthropicAdapter, OpenAIAdapter, and an optional SQLite persistent queue. The scheduler supports both in-memory and persistent modes—the latter can retain task states across system restarts.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23