Section 01
[Introduction] GPU Direct Storage Cold Start Optimization: LLM Serverless Inference Acceleration Solution
This project aims to optimize cold start latency for LLM serverless inference by combining NVIDIA GPUDirect Storage (GDS), CRIU container snapshots, and CUDA Checkpoint/Restore technologies, with the goal of achieving sub-second GPU state initialization. The project is maintained by avaneesh1830 and open-sourced on GitHub (link: https://github.com/avaneesh1830/gpu-direct-storage-coldstarts), released on June 4, 2026. Currently, the project is in Week 1, conducting research on the NV Stack technology stack.