Zing Forum

Reading

inference-research: Automated LLM Inference Engine Nightly Tracking and Benchmarking System

Inspired by Andrej Karpathy's autoresearch, it automatically crawls updates from mainstream inference engines like vLLM and SGLang every night, uses Claude Opus for intelligent filtering, and generates executable benchmark plans for DGX Spark clusters.

LLM推理vLLMSGLangTensorRT-LLM自动化研究基准测试DGX SparkClaude Opus
Published 2026-04-14 21:45Recent activity 2026-04-14 21:51Estimated read 7 min
inference-research: Automated LLM Inference Engine Nightly Tracking and Benchmarking System
1

Section 01

inference-research: Automated LLM Inference Engine Nightly Tracking and Benchmarking System Guide

inference-research is an automated tool inspired by Andrej Karpathy's autoresearch, focusing on nightly tracking and benchmarking of LLM inference engines. It addresses the challenges faced by inference system engineers in tracking technical progress, evaluating the impact of new features, and converting these into executable experimental plans. Core features include: automatically crawling updates from 5 major mainstream inference engines like vLLM and SGLang every night, using Claude Opus for intelligent filtering, and generating executable benchmark plans for DGX Spark clusters.

2

Section 02

Project Background and Design Philosophy

Background

Andrej Karpathy's autoresearch demonstrated methods for automated tracking of machine learning frontiers. inference-research draws on this concept but focuses on inference system optimization. Since engines like vLLM and SGLang evolve daily, manual tracking easily misses key updates, requiring an automated solution.

Design Principles

  • Comprehensive Coverage: Monitor 5 major mainstream inference engines
  • Intelligent Filtering: Claude Opus ranks updates by influence and provides explanations
  • Action-Oriented: Convert insights into executable benchmark plans for real hardware
3

Section 03

Monitored Engines and Hardware Infrastructure

Five Major Inference Engines

Project Repository Core Technical Focus
vLLM vllm-project/vllm PagedAttention, Chunked Prefilling, Speculative Decoding
SGLang sgl-project/sglang RadixAttention, Prefix Caching, Constrained Decoding
TensorRT-LLM NVIDIA/TensorRT-LLM Quantization, Dynamic Batching, Blackwell Kernels
llm-d llm-d/llm-d K8s Native Service, Prefill/Decode Separation
Dynamo ai-dynamo/dynamo KV Routing, NIXL, Separate Inference OS

Hardware Cluster

Node IP Address Configuration
spark-01 192.168.1.76 DGX Spark 128GB Unified Memory (NVLink-C2C)
spark-02 192.168.1.77 DGX Spark 128GB Unified Memory (NVLink-C2C)
controller 192.168.1.75 CPU-only Orchestration Node
4

Section 04

Automated Workflow

The system runs daily at 2 AM:

Data Collection

  • GitHub API: Crawl PRs and releases from 5 repositories
  • arXiv: Retrieve inference-related papers of the day Raw data is saved as JSON for audit support.

Intelligent Curation

Claude Opus analysis:

  • Influence Ranking: Grade by technical importance
  • Meaning Interpretation: Explain the value of changes
  • Impact Rating: 🔴 (High), 🟡 (Medium), 🟢 (Low)

Benchmark Plan Generation

Generate a sequence of executable bash commands for DGX Spark clusters.

Versioned Commit

All outputs (reports, data, plans, logs) are committed to Git, forming a traceable history.

5

Section 05

Technical Highlights and Application Scenarios

Technical Highlights

  1. Intelligent Automation: Efficient division of labor between machine collection + AI understanding + human decision-making
  2. Hardware-Software Integration: Deep integration with DGX clusters, converting insights into actual measurement plans
  3. Ecosystem Panorama: Covers 5 engines with different technical routes
  4. Scalable Architecture: Easy to add repositories, adjust strategies, or replace LLMs

Application Scenarios

  • Inference R&D Teams: Track competitor dynamics
  • AI Infra Engineers: Discover performance optimization opportunities
  • Technical Decision-Makers: Grasp trends to support selection
  • Academic Researchers: Understand industrial progress
  • Hardware Vendors: Optimize hardware to match software requirements
6

Section 06

Limitations and Improvement Directions

Limitations

  • Limited Data Sources: Does not cover Hugging Face, Papers with Code
  • Lack of Community Voices: Does not track issues and discussions
  • Benchmark Execution Requires Manual Effort: Not fully automated
  • Single Hardware Support: Only DGX Spark

Improvement Directions

  • Expand data sources to Hugging Face and others
  • Add community discussion tracking
  • Implement automatic benchmark execution
  • Support more hardware configurations