Section 01
GPUCache Project Overview: A PB-scale Ultra-low Latency Distributed GPU Cache System
GPUCache is an open-source PB-scale ultra-low latency distributed GPU cache system developed by the rustfs team, specifically designed for AI inference scenarios. The project was released on May 26, 2026, and its source code is hosted on GitHub (link: https://github.com/rustfs/GPUCache).
Using Rust language, NVIDIA DOCA framework, RDMA network protocol, and BF-4 DPU technology, this system builds a high-speed bridge between GPU HBM and NVMe storage, aiming to solve memory bottleneck issues in LLM inference, eliminate redundant computation overhead, and balance cost and performance.