Zing Forum

Reading

InferenceX Scraper: An Open-Source LLM Inference Performance Data Collection and Analysis Platform

A fully-featured open-source project that integrates three data sources—InferenceX, OpenRouter, and Artificial Analysis—providing automated collection, trend analysis, and visual display of LLM inference performance benchmark data.

LLM大语言模型性能基准数据采集OpenRouterInferenceXArtificial Analysis数据可视化趋势分析Python
Published 2026-05-09 21:13Recent activity 2026-05-09 21:18Estimated read 8 min
InferenceX Scraper: An Open-Source LLM Inference Performance Data Collection and Analysis Platform
1

Section 01

InferenceX Scraper Project Guide: Open-Source LLM Inference Performance Data Integration Platform

InferenceX Scraper is a fully-featured open-source project designed to address the pain points of scattered LLM performance data sources, inconsistent formats, and varying update frequencies. It integrates three data sources—InferenceX, OpenRouter, and Artificial Analysis—to provide automated collection, trend analysis, and visual display of LLM inference performance benchmark data, offering reliable references for developers and enterprises in model selection.

2

Section 02

Project Background: Addressing the Pain Point of Scattered LLM Performance Data

In today's era of rapid LLM development, model performance evaluation is an important basis for developers and enterprises in model selection. However, scattered data sources, inconsistent formats, and varying update frequencies have caused troubles. InferenceX Scraper builds a unified data collection and analysis platform, integrating three data sources (SemiAnalysis's InferenceX platform, OpenRouter model call statistics, and Artificial Analysis model evaluation data) to form a comprehensive data view covering performance benchmarks, actual usage volume, and comprehensive scores.

3

Section 03

Core Architecture: A Complete Data Solution with Four Separated Layers

The project adopts a layered architecture, separating data collection, processing, storage, and display components:

  • Data Collection Layer: Includes three modules—OpenRouter (collects call volume, application distribution, model details), Artificial Analysis (collects Intelligence Index, inference speed, price), and InferenceX (collects performance benchmarks)—all following a unified storage interface.
  • Data Analysis Layer: Offers three services—trend analysis (7-day/30-day moving average), anomaly detection (Z-Score for anomaly identification), and cluster analysis (application scenario clustering).
  • Data Storage Layer: Supports JSON (raw data), Excel (business-friendly), CSV (tool integration), and SQLite (structured query). Usage volume data is uniformly measured in billions.
  • Web Service Layer: The backend provides RESTful APIs based on FastAPI, while the frontend uses React+ECharts+Ant Design, including modules like the overview page and model comparison.
4

Section 04

Data Scale: Over 38k Records Covering Mainstream LLM Models

As of now, the database contains over 38,000 records:

  • Daily model call volume: 17,634 entries
  • Application usage distribution: 2,090 entries
  • Model metadata:713 entries
  • OpenRouter application information:28 entries
  • OpenRouter model details:12,945 entries
  • Artificial Analysis performance data:1,995 entries
  • InferenceX benchmark data:3,292 entries It covers mainstream models such as Llama-3.3-70B, DeepSeek-R1, Kimi-K2.5, MiniMax-M2.5, Qwen-3.5, GLM-5, etc.
5

Section 05

Technical Highlights: Modular Design and Flexible Operation Modes

The project's technical highlights include:

  • Modular Design: Each functional module (crawler, analysis, storage, Web) has clear responsibilities and can be developed, tested, and deployed independently.
  • Unified Data Management: The DataStorage class encapsulates operations for multiple storage formats, simplifying data persistence.
  • Flexible Operation Modes: Supports three modes—one-time collection (once), API service (api), and continuous collection (collector).
  • Comprehensive Documentation: A detailed README covers installation configuration, usage methods, and API descriptions, lowering the entry barrier.
6

Section 06

Application Scenarios: Assisting Model Selection and Market Trend Research

The project's application scenarios include:

  • Model Selection Decision: Provides comprehensive data such as performance, price, and usage rate to assist enterprises and developers in model selection.
  • Market Trend Research: Analyzes call volume time series to gain insights into market competition patterns and application scenario preferences.
  • Performance Benchmark Tracking: InferenceX data helps model developers identify optimization directions and compare with competitors.
  • Teaching and Demonstration: Serves as a data engineering case to demonstrate the construction of a complete data pipeline.
7

Section 07

Future Outlook: Expanding Data Sources and Improving Real-Time Performance

Future optimization directions for the project:

  • Data Source Expansion: Integrate platforms like LMSYS Chatbot Arena and Hugging Face Leaderboard.
  • Real-Time Performance Improvement: Explore stream data processing to enhance data real-time performance.
  • Prediction Capability Enhancement: Build prediction models based on historical data to provide forward-looking insights.
  • Deepened Community Collaboration: Establish a data contribution mechanism to share data and form a joint effort.
8

Section 08

Conclusion: Exploration of LLM Data Infrastructure in the Open-Source Ecosystem

InferenceX Scraper demonstrates the open-source community's active exploration in building LLM data infrastructure. In an era of scattered information, the attempt to integrate multi-source data and provide a unified view is particularly valuable, offering a window for technical researchers, product decision-makers, and developers to understand the full picture of the LLM ecosystem.