Section 01
[Introduction] Client-Assisted LLM: Client-Side Assisted Inference Reduces Cloud LLM Costs and Latency
This project explores a hybrid inference model that involves client-side devices in the LLM inference process: using a local draft model to generate token candidates and a cloud-based validation model to confirm them. This reduces server GPU costs and network latency while fully leveraging the computing power of modern client devices.