# InferenceX Scraper: An Open-Source LLM Inference Performance Data Collection and Analysis Platform

> A fully-featured open-source project that integrates three data sources—InferenceX, OpenRouter, and Artificial Analysis—providing automated collection, trend analysis, and visual display of LLM inference performance benchmark data.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-09T13:13:31.000Z
- 最近活动: 2026-05-09T13:18:24.180Z
- 热度: 163.9
- 关键词: LLM, 大语言模型, 性能基准, 数据采集, OpenRouter, InferenceX, Artificial Analysis, 数据可视化, 趋势分析, Python
- 页面链接: https://www.zingnex.cn/en/forum/thread/inferencex-scraper-llm
- Canonical: https://www.zingnex.cn/forum/thread/inferencex-scraper-llm
- Markdown 来源: floors_fallback

---

## InferenceX Scraper Project Guide: Open-Source LLM Inference Performance Data Integration Platform

InferenceX Scraper is a fully-featured open-source project designed to address the pain points of scattered LLM performance data sources, inconsistent formats, and varying update frequencies. It integrates three data sources—InferenceX, OpenRouter, and Artificial Analysis—to provide automated collection, trend analysis, and visual display of LLM inference performance benchmark data, offering reliable references for developers and enterprises in model selection.

## Project Background: Addressing the Pain Point of Scattered LLM Performance Data

In today's era of rapid LLM development, model performance evaluation is an important basis for developers and enterprises in model selection. However, scattered data sources, inconsistent formats, and varying update frequencies have caused troubles. InferenceX Scraper builds a unified data collection and analysis platform, integrating three data sources (SemiAnalysis's InferenceX platform, OpenRouter model call statistics, and Artificial Analysis model evaluation data) to form a comprehensive data view covering performance benchmarks, actual usage volume, and comprehensive scores.

## Core Architecture: A Complete Data Solution with Four Separated Layers

The project adopts a layered architecture, separating data collection, processing, storage, and display components:
- **Data Collection Layer**: Includes three modules—OpenRouter (collects call volume, application distribution, model details), Artificial Analysis (collects Intelligence Index, inference speed, price), and InferenceX (collects performance benchmarks)—all following a unified storage interface.
- **Data Analysis Layer**: Offers three services—trend analysis (7-day/30-day moving average), anomaly detection (Z-Score for anomaly identification), and cluster analysis (application scenario clustering).
- **Data Storage Layer**: Supports JSON (raw data), Excel (business-friendly), CSV (tool integration), and SQLite (structured query). Usage volume data is uniformly measured in billions.
- **Web Service Layer**: The backend provides RESTful APIs based on FastAPI, while the frontend uses React+ECharts+Ant Design, including modules like the overview page and model comparison.

## Data Scale: Over 38k Records Covering Mainstream LLM Models

As of now, the database contains over 38,000 records:
- Daily model call volume: 17,634 entries
- Application usage distribution: 2,090 entries
- Model metadata:713 entries
- OpenRouter application information:28 entries
- OpenRouter model details:12,945 entries
- Artificial Analysis performance data:1,995 entries
- InferenceX benchmark data:3,292 entries
It covers mainstream models such as Llama-3.3-70B, DeepSeek-R1, Kimi-K2.5, MiniMax-M2.5, Qwen-3.5, GLM-5, etc.

## Technical Highlights: Modular Design and Flexible Operation Modes

The project's technical highlights include:
- **Modular Design**: Each functional module (crawler, analysis, storage, Web) has clear responsibilities and can be developed, tested, and deployed independently.
- **Unified Data Management**: The DataStorage class encapsulates operations for multiple storage formats, simplifying data persistence.
- **Flexible Operation Modes**: Supports three modes—one-time collection (once), API service (api), and continuous collection (collector).
- **Comprehensive Documentation**: A detailed README covers installation configuration, usage methods, and API descriptions, lowering the entry barrier.

## Application Scenarios: Assisting Model Selection and Market Trend Research

The project's application scenarios include:
- **Model Selection Decision**: Provides comprehensive data such as performance, price, and usage rate to assist enterprises and developers in model selection.
- **Market Trend Research**: Analyzes call volume time series to gain insights into market competition patterns and application scenario preferences.
- **Performance Benchmark Tracking**: InferenceX data helps model developers identify optimization directions and compare with competitors.
- **Teaching and Demonstration**: Serves as a data engineering case to demonstrate the construction of a complete data pipeline.

## Future Outlook: Expanding Data Sources and Improving Real-Time Performance

Future optimization directions for the project:
- **Data Source Expansion**: Integrate platforms like LMSYS Chatbot Arena and Hugging Face Leaderboard.
- **Real-Time Performance Improvement**: Explore stream data processing to enhance data real-time performance.
- **Prediction Capability Enhancement**: Build prediction models based on historical data to provide forward-looking insights.
- **Deepened Community Collaboration**: Establish a data contribution mechanism to share data and form a joint effort.

## Conclusion: Exploration of LLM Data Infrastructure in the Open-Source Ecosystem

InferenceX Scraper demonstrates the open-source community's active exploration in building LLM data infrastructure. In an era of scattered information, the attempt to integrate multi-source data and provide a unified view is particularly valuable, offering a window for technical researchers, product decision-makers, and developers to understand the full picture of the LLM ecosystem.
