Section 01
PipeSD: A Speculative Decoding Acceleration Framework for Cloud-Edge Collaborative Inference (Introduction)
PipeSD is a speculative decoding acceleration framework designed for cloud-edge collaborative inference scenarios. Its core uses a pipeline scheduling mechanism and a Bayesian optimization-based validation triggering strategy to address the issues of low resource utilization and improper validation timing in existing cloud-edge collaborative speculative decoding. It achieves up to 2.16x end-to-end speedup and 25.3% energy reduction, suitable for edge computing, privacy-sensitive applications, and other scenarios.