Zing Forum

Reading

Verbodus: A Lightweight Tool for Performance Benchmarking of Local Large Language Models

Verbodus is a desktop application developed with Tauri and Vue.js, specifically designed for real-time benchmarking of key performance metrics of large language models, including first-token latency, generation speed, and throughput.

大语言模型基准测试性能优化TauriVue.js本地部署Ollama
Published 2026-05-21 03:40Recent activity 2026-05-21 03:51Estimated read 5 min
Verbodus: A Lightweight Tool for Performance Benchmarking of Local Large Language Models
1

Section 01

[Introduction] Verodus: A Lightweight Tool for Performance Benchmarking of Local LLMs

Verbodus is a desktop application developed using Tauri and Vue.js, specifically for real-time benchmarking of key performance metrics of large language models (including first-token latency, generation speed, and throughput). It helps users objectively evaluate the performance of locally deployed LLMs and optimize deployment decisions.

2

Section 02

Background: Why Do We Need a Specialized LLM Benchmarking Tool?

With the explosive growth of open-source large language models, more and more developers are choosing to run LLMs locally (e.g., Ollama, LM Studio, vLLM). However, they face three major challenges: How to objectively evaluate the performance of different models? How to balance latency and throughput? How to compare the performance of different hardware configurations? Verbodus is a lightweight desktop application designed to address these issues.

3

Section 03

Analysis of Core Performance Metrics

Verbodus tracks three industry-standard metrics with clear grading:

  1. Time to First Token (TTFT):The time it takes for the model to process the prompt and generate the first token. Excellent if below 250ms, good between 250-800ms, and slow if over 800ms.
  2. Time per Token (TPOT):The average generation speed of subsequent tokens. Excellent if below 22ms per token (over 45 tokens per second).
  3. Throughput (TPS):Total number of tokens generated per second. Fluctuations and bottlenecks are displayed via real-time charts.
4

Section 04

Technical Architecture: A Lightweight and Efficient Modern Desktop Application

Verbodus uses a modern tech stack: The frontend uses Vue3 Composition API + native CSS to implement glassmorphism design, and Chart.js supports real-time data stream visualization. The underlying layer uses Tauri v2 (Rust backend + native WebView), which significantly reduces memory usage and startup time compared to Electron, making it suitable for long-term benchmarking.

5

Section 05

Rich Testing Scenarios

Verbodus offers multiple testing modes:

  • Performance Playground: Customize prompts and view response streams and telemetry statistics in real time.
  • Engine Comparison Dashboard: Compare up to 4 historical tests simultaneously; dual-axis bar charts intuitively compare TTFT and average TPS.
  • Metadata Inspector: Display complete parameters and token breakdown information for each test.
6

Section 06

Flexible API Configuration and Data Persistence

API Configuration: Supports endpoints compatible with the OpenAI API (local/remote). Presets default configurations for Ollama (11434), LM Studio (1234), and vLLM (8000). Allows customization of API address, model, temperature, maximum tokens, etc., and supports input of API keys for remote services. Data Persistence: All test history and configurations are automatically saved in browser native storage; they are not lost when the application is closed, facilitating long-term tracking and analysis.

7

Section 07

Conclusion: The Value of Verbodus and Recommendations

Verbodus fills a gap in the local LLM deployment ecosystem and serves as a bridge between model capabilities and actual application experience. For developers who value LLM performance optimization, it is recommended to add Verbodus to their toolbox.