Section 01
[Introduction] Exploring the Limits of Framework Desktop Large Model Inference: Practical Optimization on the Strix Halo Platform
This research project focuses on the Framework Desktop platform with AMD Strix Halo architecture, combining RTX 3090 to optimize large model inference via llama.cpp RPC. It completed 34 tasks covering cutting-edge technologies like KV cache compression, speculative decoding, and heterogeneous RPC inference, exploring the limits of desktop-level LLM inference and challenging the traditional reliance on data center GPUs.