Reading

WhatCanIRun: An MCP-based LLM Inference Budget Planning Tool

Introducing the WhatCanIRun project, a practical tool that converts large language model (LLM) inference budgets into actionable plans via the MCP protocol, helping users select optimal model configuration strategies under budget constraints.

MCPLLM预算模型选型成本优化API定价本地部署推理规划大语言模型

Published 2026-05-26 09:45Recent activity 2026-05-26 09:53Estimated read 7 min

Section 01

[Introduction] WhatCanIRun: An MCP-based LLM Inference Budget Planning Tool

WhatCanIRun is an open-source project maintained by maheshbabugorantla (GitHub link: https://github.com/maheshbabugorantla/whatcanirun, release date: 2026-05-26T01:45:45Z). It is an MCP-based LLM inference budget planning tool designed to help developers and enterprises resolve cost decision dilemmas in LLM deployment. By systematically integrating data, it converts budget constraints into actionable model configuration plans, supporting scenarios such as API budget planning and local deployment evaluation. Its core value lies in simplifying the end-to-end conversion process from budget to plan.

Section 02

Project Background: Cost Dilemmas in LLM Deployment

As large language models expand their capabilities, developers and enterprises face complex cost decisions: How to choose an API calling strategy given a budget? What hardware is needed for local deployment? How to balance capability and latency? Traditional experience-based estimation or trial-and-error methods are inefficient. WhatCanIRun provides a systematic solution to convert budgets into specific configuration plans.

Section 03

Core Features and Technical Architecture

MCP Protocol Integration

WhatCanIRun serves as an MCP server, supporting client calls from Claude Desktop, Cursor, etc., to achieve seamless ecosystem integration.

Budget Conversion Logic

The tool maintains a comprehensive model database covering dimensions such as model specifications (parameter count, context window), performance benchmarks, cost data (API pricing), hardware requirements, and latency characteristics. It generates and ranks candidate plans based on this data.

Section 04

Use Cases and Practical Examples (Evidence)

Use Case 1: API Budget Planning

A startup team with a $500/month budget, 2000 requests/day (500 tokens per request), and a 90% accuracy requirement. The tool returns plans like cost-effectiveness (GPT-3.5, $420/month, 92% accuracy), balance (mix of 3.5 and 4, $480/month, 95% accuracy), etc.

Use Case 2: Local Deployment Evaluation

Enterprise private deployment of Llama3 70B. The tool provides the minimum configuration (2x A100 80GB), hardware cost ($15,000 one-time), monthly operating cost ($500), and a comparison with equivalent API costs.

Use Case 3: Capacity Planning

AI writing assistant phased strategy: cold start (pure API), growth (API + caching), scale (hybrid deployment/self-built cluster).

Section 05

Technical Implementation Details

Model Database Maintenance

The database is maintained by automatically scraping official pricing, integrating data from Hugging Face/Papers With Code, referencing cloud vendor hardware costs, and community contributions for updates.

Ranking Algorithm

The ranking algorithm sorts plans based on cost compliance, performance satisfaction, reliability score, and complexity cost; users can adjust weights.

Traceable Sources

Each plan comes with data source references, supporting traceability to benchmark tests, pricing pages, or community discussions.

Section 06

Limitations and Notes

Data Timeliness: The LLM field changes rapidly; it is recommended to verify the latest data before making decisions.
Scenario Coverage: Currently focuses on text generation; support for multimodal/specific domains needs improvement.
Actual Performance Differences: Latency/throughput are based on typical scenarios; small-scale verification is required before production.

Section 07

Practical Application Recommendations

Clarify constraints: Sort out hard conditions such as budget, performance, latency, etc.
Compare multiple plans: Understand the trade-off logic of each option.
Small-scale verification: Conduct PoC tests on candidate plans.
Continuous monitoring: Establish a cost tracking mechanism.
Feedback and contribution: Share usage experience with the community.

Section 08

Summary and Future Development Directions

Summary

WhatCanIRun simplifies the LLM budget decision-making process and narrows the decision scope, but it needs to be verified in combination with actual scenarios and cannot replace human judgment.

Future Directions

Future plans include expanding multimodal support, fine-tuning cost calculation, carbon footprint estimation, and contract negotiation assistance.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15