# tps.sh: A Performance Benchmarking Tool for Local and Cloud Large Language Models

> tps.sh is an open-source tool focused on performance testing of large language models (LLMs). It compares the tokens per second (TPS) performance of local Ollama models and cloud services like the Claude API through 147 tests, helping users make optimal deployment decisions on Apple Silicon devices.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-04-30T00:14:14.000Z
- 最近活动: 2026-04-30T02:06:55.580Z
- 热度: 153.1
- 关键词: tps.sh, 大语言模型基准测试, tokens per second, Ollama, Claude API, Apple Silicon, 本地部署, 云端API, 性能测试, LLM评估
- 页面链接: https://www.zingnex.cn/en/forum/thread/tps-sh-f15e9553
- Canonical: https://www.zingnex.cn/forum/thread/tps-sh-f15e9553
- Markdown 来源: floors_fallback

---

## [Introduction] tps.sh: Core Overview of the Performance Benchmarking Tool for Local and Cloud LLMs

tps.sh is an open-source tool dedicated to performance testing of large language models (LLMs). Its core goal is to compare the tokens per second (TPS) performance of local Ollama models and the cloud-based Claude API through 147 tests, helping users make optimal deployment decisions on Apple Silicon devices. This tool eliminates technical barriers to performance evaluation by encapsulating complex testing logic into a concise command-line interface, supporting cross-platform operation, and providing data-driven references for LLM deployment to developers and users.

## Tool Background and Design Positioning

In practical LLM applications, performance is a key consideration, but traditional performance evaluation requires complex scripts and technical background. The design goal of tps.sh is clear: to provide a simple and intuitive tool that allows users to easily compare the performance of different LLMs. The tool is specifically optimized for the Apple Silicon architecture, leveraging its neural engine and unified memory architecture; it also supports the Windows system, expanding its scope of application and lowering the threshold for performance evaluation.

## Testing System and Multi-Dimensional Evaluation

tps.sh has built-in 147 test cases covering various scenarios (including text generation, code completion, logical reasoning, etc.) and uses 21 typical sample prompt inputs. Its core function is to perform parallel tests on local Ollama models (on Apple Silicon devices) and cloud-based Claude API services. Evaluation dimensions include processing speed (TPS), generation quality, cloud cost analysis, and resource usage (CPU/memory/GPU), enabling multi-dimensional performance comparison.

## Technical Implementation and Architecture Optimization

tps.sh is optimized for Apple Silicon: it automatically uses the neural engine to accelerate inference and leverages the unified memory architecture to reduce data transfer overhead. The Windows version supports testing of various local LLM runtimes or cloud APIs, with system requirements including Windows 10+, 8GB RAM, and a 2GHz processor. The tool supports testing of 7 models, with flexible and extensible configuration mechanisms that allow specifying model paths, API parameters, etc., via configuration files.

## Usage Workflow and Result Interpretation

Installation: Download precompiled binaries or installation packages from GitHub; Windows users can choose between .exe or .zip files. Running: Enter `tps.sh` in the command line, which automatically loads the configuration and executes 147 tests for 7 models, displaying progress in real time. Results: Generate a detailed report including overall performance ranking, task type analysis, cost-benefit tradeoff, and raw data tables, helping users quickly identify performance bottlenecks and deployment pros and cons.

## Practical Application Scenarios and Value

The application scenarios of tps.sh include: hardware selection decisions (assessing whether existing devices meet local deployment requirements), model selection references (objectively comparing the performance, quality, and cost of open-source/commercial models), deployment mode evaluation (scenario adaptability of local vs. cloud), and performance monitoring and optimization (establishing baselines and regularly detecting performance changes).

## Community Ecosystem and Future Outlook

As an open-source project, tps.sh has an active GitHub community where users can submit issues, share results, and contribute code. Future plans include adding support for more models and platforms, expanding test coverage, and introducing advanced performance analysis features. The tool fills the gap in LLM performance evaluation, promotes ecosystem optimization, and is expected to become one of the standard tools for LLM performance evaluation.
