# inference-research: Automated LLM Inference Engine Nightly Tracking and Benchmarking System

> Inspired by Andrej Karpathy's autoresearch, it automatically crawls updates from mainstream inference engines like vLLM and SGLang every night, uses Claude Opus for intelligent filtering, and generates executable benchmark plans for DGX Spark clusters.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-14T13:45:24.000Z
- 最近活动: 2026-04-14T13:51:06.304Z
- 热度: 141.9
- 关键词: LLM推理, vLLM, SGLang, TensorRT-LLM, 自动化研究, 基准测试, DGX Spark, Claude Opus
- 页面链接: https://www.zingnex.cn/en/forum/thread/inference-research-llm-nightly
- Canonical: https://www.zingnex.cn/forum/thread/inference-research-llm-nightly
- Markdown 来源: floors_fallback

---

## inference-research: Automated LLM Inference Engine Nightly Tracking and Benchmarking System Guide

inference-research is an automated tool inspired by Andrej Karpathy's autoresearch, focusing on nightly tracking and benchmarking of LLM inference engines. It addresses the challenges faced by inference system engineers in tracking technical progress, evaluating the impact of new features, and converting these into executable experimental plans. Core features include: automatically crawling updates from 5 major mainstream inference engines like vLLM and SGLang every night, using Claude Opus for intelligent filtering, and generating executable benchmark plans for DGX Spark clusters.

## Project Background and Design Philosophy

### Background
Andrej Karpathy's autoresearch demonstrated methods for automated tracking of machine learning frontiers. inference-research draws on this concept but focuses on inference system optimization. Since engines like vLLM and SGLang evolve daily, manual tracking easily misses key updates, requiring an automated solution.

### Design Principles
- **Comprehensive Coverage**: Monitor 5 major mainstream inference engines
- **Intelligent Filtering**: Claude Opus ranks updates by influence and provides explanations
- **Action-Oriented**: Convert insights into executable benchmark plans for real hardware

## Monitored Engines and Hardware Infrastructure

### Five Major Inference Engines
| Project | Repository | Core Technical Focus |
|------|------|--------------|
| vLLM | vllm-project/vllm | PagedAttention, Chunked Prefilling, Speculative Decoding |
| SGLang | sgl-project/sglang | RadixAttention, Prefix Caching, Constrained Decoding |
| TensorRT-LLM | NVIDIA/TensorRT-LLM | Quantization, Dynamic Batching, Blackwell Kernels |
| llm-d | llm-d/llm-d | K8s Native Service, Prefill/Decode Separation |
| Dynamo | ai-dynamo/dynamo | KV Routing, NIXL, Separate Inference OS |

### Hardware Cluster
| Node | IP Address | Configuration |
|------|--------|------|
| spark-01 | 192.168.1.76 | DGX Spark 128GB Unified Memory (NVLink-C2C) |
| spark-02 | 192.168.1.77 | DGX Spark 128GB Unified Memory (NVLink-C2C) |
| controller | 192.168.1.75 | CPU-only Orchestration Node |

## Automated Workflow

The system runs daily at 2 AM:

### Data Collection
- GitHub API: Crawl PRs and releases from 5 repositories
- arXiv: Retrieve inference-related papers of the day
Raw data is saved as JSON for audit support.

### Intelligent Curation
Claude Opus analysis:
- Influence Ranking: Grade by technical importance
- Meaning Interpretation: Explain the value of changes
- Impact Rating: 🔴 (High), 🟡 (Medium), 🟢 (Low)

### Benchmark Plan Generation
Generate a sequence of executable bash commands for DGX Spark clusters.

### Versioned Commit
All outputs (reports, data, plans, logs) are committed to Git, forming a traceable history.

## Technical Highlights and Application Scenarios

### Technical Highlights
1. **Intelligent Automation**: Efficient division of labor between machine collection + AI understanding + human decision-making
2. **Hardware-Software Integration**: Deep integration with DGX clusters, converting insights into actual measurement plans
3. **Ecosystem Panorama**: Covers 5 engines with different technical routes
4. **Scalable Architecture**: Easy to add repositories, adjust strategies, or replace LLMs

### Application Scenarios
- Inference R&D Teams: Track competitor dynamics
- AI Infra Engineers: Discover performance optimization opportunities
- Technical Decision-Makers: Grasp trends to support selection
- Academic Researchers: Understand industrial progress
- Hardware Vendors: Optimize hardware to match software requirements

## Limitations and Improvement Directions

### Limitations
- Limited Data Sources: Does not cover Hugging Face, Papers with Code
- Lack of Community Voices: Does not track issues and discussions
- Benchmark Execution Requires Manual Effort: Not fully automated
- Single Hardware Support: Only DGX Spark

### Improvement Directions
- Expand data sources to Hugging Face and others
- Add community discussion tracking
- Implement automatic benchmark execution
- Support more hardware configurations