# CUDA 90-Day Intensive Challenge: Building Production-Grade LLM Inference Infrastructure with Rust and C++

> A systematic 90-day learning plan exploring how to write native GPU kernel functions using Rust and CUDA C++, and build memory-safe, high-concurrency AI inference systems.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-10T07:15:20.000Z
- 最近活动: 2026-06-10T07:23:25.571Z
- 热度: 156.9
- 关键词: CUDA, GPU编程, Rust, AI基础设施, LLM推理, 高性能计算, cuda-oxide, SGLang, Candle, PyTorch, 系统编程
- 页面链接: https://www.zingnex.cn/en/forum/thread/cuda-90-rustc-llm
- Canonical: https://www.zingnex.cn/forum/thread/cuda-90-rustc-llm
- Markdown 来源: floors_fallback

---

## Introduction to the CUDA 90-Day Intensive Challenge Project

This project is a 90-day AI infrastructure challenge initiated by wenfeizou, aiming to transition from system development to the field of AI infrastructure and high-performance inference engines. Focused on practice, the project explores building memory-safe, high-concurrency production-grade LLM inference systems using Rust and CUDA C++ through runnable code, benchmark tests, and performance analysis. This thread will introduce the project background, technical roadmap, learning roadmap, repository structure, and key insights in detail across different floors.

## Project Background: Transition from System Development to AI Infrastructure

With the rapid development of LLMs, AI infrastructure has become a popular field, but engineers capable of developing high-performance inference systems are scarce. This project documents the author's transition from system development to AI infrastructure, emphasizing practice first: write fewer vague notes, and leave more runnable code, benchmark, and profiling records. The project is not just study notes but also engineering experiment records.

## Core Technical Roadmap: Rust + CUDA C++ Dual-Track Parallelism

**Reasons for Choosing Rust**: Memory safety (avoids errors at compile time), zero-cost abstractions (performance close to C++), modern toolchain (Cargo), FFI capabilities (interoperability with C++). The core experiments use the cuda-oxide crate to implement Rust native GPU kernel functions.
**Importance of CUDA C++**: Need to master thread hierarchy, memory hierarchy, warp execution model, and performance optimization techniques (e.g., coalesced memory access, avoiding bank conflicts) to understand GPU architecture and reuse existing code.

## 90-Day Roadmap: From Kernel to Full-Link Closed Loop

The roadmap is divided into three phases:
1. **CUDA Kernel Basics**: Vector addition, matrix multiplication, memory optimization (shared memory/coalesced access), reduction algorithms, convolution operations.
2. **Rust GPU Programming**: cuda-oxide basics, GPU memory management, Rust-C++ CUDA interoperability, asynchronous execution (async/await + CUDA streams).
3. **LLM Inference Infrastructure**: Transformer operator optimization, KV Cache management, dynamic batch scheduling, distributed inference architecture.

## Repository Structure and Analysis of Support Capability Layers

**Repository Structure**: Separated by concerns, including directories like days (daily experiments), kernels (C++/Rust kernel functions), frameworks (PyTorch/Candle), runtime (SGLang), infra (support layer), benchmarks (performance tests), etc.
**Support Layers**:
- Linux: Driver installation, Nsight tools, dynamic library management, performance observation.
- C++: CMake build, memory model, template programming, Host/Device code organization.
- Rust: Unsafe code, ownership management, FFI, asynchronous runtime.
- Python: PyTorch baseline verification, data generation, correctness checking.

## Key Tools and Experimental Environment Configuration

**Key Tools**:
- SGLang: High-performance inference runtime with features like structured generation, RadixAttention, request scheduling; learning value includes mastering serving system design, KV Cache management, etc.
- PyTorch: Used as a correctness verification baseline and performance comparison, learning CUDA Extension and compiler technologies.
- Candle: Hugging Face's Rust-native framework, learning tensor operations, model loading, CUDA backend integration.
**Experimental Environment**: Ubuntu26.04 LTS, CUDA13.3, Rust1.98+, tools including Nsight Systems/Compute.

## Learning Insights and Project Summary

**Learning Insights**:
1. Practice First: Write code, run experiments, do analysis, and understand performance through benchmarks and profiling.
2. Systems Thinking: Need to master full-stack knowledge, focus on performance, engineering quality, and continuous learning.
3. Rust's Potential: Memory safety, high performance, concurrency support—combined with frameworks like Candle, it has broad prospects in the AI infra field.
**Summary**: The project provides a clear roadmap, engineering learning methods, and a complete technology stack, which is of great value to AI infrastructure learners. It is recommended to follow the project and explore the possibilities of Rust and CUDA through this intensive challenge journey.
