Section 01
DaseR: A RAG-Native KV Cache Service to Accelerate LLM Inference
DaseR is a KV cache service specifically designed for Retrieval-Augmented Generation (RAG) scenarios. It significantly reduces Time To First Token (TTFT) and improves long-context inference efficiency by preloading document vector caches. This post will introduce its background, architecture, performance benefits, application scenarios, and future outlook.