章节 01
SMEPilot: An Optimized LLM Inference Engine Using Scalable Matrix Extensions
SMEPilot is an LLM inference optimization engine that leverages Roofline model analysis of SME-enabled CPU features to intelligently select CPU/SME/collaborative execution modes for operator-level optimization. It achieves up to 3.94x end-to-end inference performance improvement across mobile, PC, and server platforms on models like Llama-3.2-3B and Qwen3-4B.