Zing Forum

Reading

WhatCanIRun: An MCP-based LLM Inference Budget Planning Tool

Introducing the WhatCanIRun project, a practical tool that converts large language model (LLM) inference budgets into actionable plans via the MCP protocol, helping users select optimal model configuration strategies under budget constraints.

MCPLLM预算模型选型成本优化API定价本地部署推理规划大语言模型
Published 2026-05-26 09:45Recent activity 2026-05-26 09:53Estimated read 7 min
WhatCanIRun: An MCP-based LLM Inference Budget Planning Tool
1

Section 01

[Introduction] WhatCanIRun: An MCP-based LLM Inference Budget Planning Tool

WhatCanIRun is an open-source project maintained by maheshbabugorantla (GitHub link: https://github.com/maheshbabugorantla/whatcanirun, release date: 2026-05-26T01:45:45Z). It is an MCP-based LLM inference budget planning tool designed to help developers and enterprises resolve cost decision dilemmas in LLM deployment. By systematically integrating data, it converts budget constraints into actionable model configuration plans, supporting scenarios such as API budget planning and local deployment evaluation. Its core value lies in simplifying the end-to-end conversion process from budget to plan.

2

Section 02

Project Background: Cost Dilemmas in LLM Deployment

As large language models expand their capabilities, developers and enterprises face complex cost decisions: How to choose an API calling strategy given a budget? What hardware is needed for local deployment? How to balance capability and latency? Traditional experience-based estimation or trial-and-error methods are inefficient. WhatCanIRun provides a systematic solution to convert budgets into specific configuration plans.

3

Section 03

Core Features and Technical Architecture

MCP Protocol Integration

WhatCanIRun serves as an MCP server, supporting client calls from Claude Desktop, Cursor, etc., to achieve seamless ecosystem integration.

Budget Conversion Logic

The tool maintains a comprehensive model database covering dimensions such as model specifications (parameter count, context window), performance benchmarks, cost data (API pricing), hardware requirements, and latency characteristics. It generates and ranks candidate plans based on this data.

4

Section 04

Use Cases and Practical Examples (Evidence)

Use Case 1: API Budget Planning

A startup team with a $500/month budget, 2000 requests/day (500 tokens per request), and a 90% accuracy requirement. The tool returns plans like cost-effectiveness (GPT-3.5, $420/month, 92% accuracy), balance (mix of 3.5 and 4, $480/month, 95% accuracy), etc.

Use Case 2: Local Deployment Evaluation

Enterprise private deployment of Llama3 70B. The tool provides the minimum configuration (2x A100 80GB), hardware cost ($15,000 one-time), monthly operating cost ($500), and a comparison with equivalent API costs.

Use Case 3: Capacity Planning

AI writing assistant phased strategy: cold start (pure API), growth (API + caching), scale (hybrid deployment/self-built cluster).

5

Section 05

Technical Implementation Details

Model Database Maintenance

The database is maintained by automatically scraping official pricing, integrating data from Hugging Face/Papers With Code, referencing cloud vendor hardware costs, and community contributions for updates.

Ranking Algorithm

The ranking algorithm sorts plans based on cost compliance, performance satisfaction, reliability score, and complexity cost; users can adjust weights.

Traceable Sources

Each plan comes with data source references, supporting traceability to benchmark tests, pricing pages, or community discussions.

6

Section 06

Limitations and Notes

  • Data Timeliness: The LLM field changes rapidly; it is recommended to verify the latest data before making decisions.
  • Scenario Coverage: Currently focuses on text generation; support for multimodal/specific domains needs improvement.
  • Actual Performance Differences: Latency/throughput are based on typical scenarios; small-scale verification is required before production.
7

Section 07

Practical Application Recommendations

  1. Clarify constraints: Sort out hard conditions such as budget, performance, latency, etc.
  2. Compare multiple plans: Understand the trade-off logic of each option.
  3. Small-scale verification: Conduct PoC tests on candidate plans.
  4. Continuous monitoring: Establish a cost tracking mechanism.
  5. Feedback and contribution: Share usage experience with the community.
8

Section 08

Summary and Future Development Directions

Summary

WhatCanIRun simplifies the LLM budget decision-making process and narrows the decision scope, but it needs to be verified in combination with actual scenarios and cannot replace human judgment.

Future Directions

Future plans include expanding multimodal support, fine-tuning cost calculation, carbon footprint estimation, and contract negotiation assistance.