# llm-speedtest-mcp: Zero-Telemetry LLM Inference Speed Benchmark MCP Server

> A lightweight MCP server tool that allows users to directly perform standardized inference speed tests on multiple LLM providers within local AI tools, measuring key metrics like TTFT (Time to First Token) and TPS (Tokens Per Second). With less than 500 lines of code, it features zero telemetry, zero data collection, and complete privacy security.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-05T05:05:55.000Z
- 最近活动: 2026-05-05T05:23:04.616Z
- 热度: 154.7
- 关键词: llm-speedtest-mcp, MCP服务器, LLM基准测试, 推理速度, 零遥测, 隐私保护, TTFT, TPS, Claude Desktop, Cursor集成
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-speedtest-mcp-llmmcp
- Canonical: https://www.zingnex.cn/forum/thread/llm-speedtest-mcp-llmmcp
- Markdown 来源: floors_fallback

---

## Introduction: llm-speedtest-mcp—Zero-Telemetry LLM Inference Speed Benchmark Tool

llm-speedtest-mcp is a lightweight MCP server tool designed to help users perform standardized inference speed tests on multiple LLM providers within local AI tools. It supports measuring key metrics such as TTFT (Time to First Token) and TPS (Tokens Per Second). With less than 500 lines of code, it adheres to the principles of zero telemetry and zero data collection to ensure privacy security. This tool can seamlessly integrate into AI tools that support the MCP protocol, such as Claude Desktop and Cursor, solving the pain point where LLM users struggle to obtain reliable and comparable inference speed data.

## Project Background and Motivation

With the development of the LLM ecosystem, developers and users face a dilemma in choosing providers: besides price, quality, and context length, inference speed is crucial for real-time interaction scenarios (e.g., chatbots, code completion). However, providers often provide theoretical values in their documentation, while actual performance is affected by network latency, load, etc., making it difficult to obtain reliable data. llm-speedtest-mcp draws inspiration from the concept of speedtest.net, integrating LLM speed testing into AI workflows while prioritizing privacy protection.

## MCP Protocol and Tool Positioning

MCP (Model Context Protocol) is an open protocol launched by Anthropic that standardizes the integration between AI models and external tools, allowing AI assistants (such as Claude Desktop and Cursor) to call external functions via interfaces. As an MCP server, llm-speedtest-mcp exposes LLM speed testing capabilities to AI tools that support MCP, enabling users to trigger tests in their familiar chat interfaces.

## Core Features and Usage Guide

**Installation and Configuration**: Supports global installation via npm (`npm install -g llm-speedtest-mcp`) or direct execution via npx; Claude/Cursor users need to add MCP server information to the corresponding configuration file and enter `benchmark my models` to trigger the test.

**Supported Providers**: Built-in support for mainstream vendors including OpenAI, Anthropic, Groq, OpenRouter, DeepSeek, MiniMax, Zhipu AI, and Kimi.

**Key Metrics**: TTFT (Time to First Token, ms), TPS (Tokens Per Second), total latency, total number of tokens.

**Result Display**: Formatted tables list data for each provider/model, with the fastest option automatically marked.

## Privacy and Security Design Details

This tool takes privacy-first as its core: 1. **Local Key Storage**: API keys are only read from environment variables and not logged to the console or error messages; 2. **Direct Connection to Providers**: API calls are sent directly from the local machine to the target endpoint without proxies or relays; 3. **Zero Data Persistence**: No databases, logs, or file writes; 4. **Minimal Dependencies**: Only depends on the MCP SDK, with less than 500 lines of code for easy auditing. All sensitive information (e.g., API keys) never leaves the user's machine.

## Technical Highlights and Application Scenarios

**Technical Highlights**: Standardized test process (same prompt, metrics measured via streaming API, controlled number of output tokens); automatic detection of configured providers (via environment variables); support for custom testing of a single model (specify model and prompt).

**Application Scenarios**: Provider/model selection (comparing speeds), network quality assessment (comparing results at different times), troubleshooting (identifying causes of slow responses), cost-benefit analysis (combining price data).

## Limitations and Future Outlook

**Limitations**: Test results are affected by prompt complexity, output length, concurrent load, and geographical location—multiple tests are recommended to take the average; providers may update models or infrastructure; not suitable for production environment monitoring.

**Future Directions**: Support more LLM providers, add quality assessment dimensions, track historical data, and enable custom test scenario configuration.