Section 01
[Introduction] llama-hdd.cpp: Core Introduction to the Disk-Persisted Inference Checkpoint Solution
llama-hdd.cpp is a soft fork of llama.cpp, released by developer LuminaNAO on GitHub (repository link: https://github.com/LuminaNAO/llama-hdd.cpp, MIT License). Its core feature is persisting prompt checkpoints (including KV cache and other states) during inference to disk, solving problems like memory limitations, state loss, and redundant computations faced by traditional LLM inference, while supporting long-context processing and state recoverability.