章节 01
RDMA KV Cache: A Separated LLM Inference Acceleration Scheme Using GPUDirect
rdma-kv-cache project implements a separated LLM inference architecture, using GPUDirect RDMA technology to achieve zero-copy KV Cache transfer between GPUs, significantly reducing large model inference latency.