Section 01
Introduction to the PPOW Framework: A New Paradigm for Performance-Driven Speculative Decoding Optimization
Introduction to the PPOW Framework
PPOW (Performance-Driven Policy Optimization with Adaptive Windowing) is a reinforcement learning framework designed to address the fundamental mismatch between token-level optimization and window-level utility in speculative decoding. Its core innovation is shifting the optimization of draft models from token-level imitation learning to window-level performance optimization. Through three key components—cost-aware acceleration rewards, distribution proximity rewards, and an adaptive divergence-aware window—it directly targets the actual speedup effect of speculative decoding. Across multiple model families and benchmarks, PPOW achieves 3.39-4.36x inference speedup, providing a new paradigm for large language model (LLM) inference optimization.