Section 01
【Introduction】Hybrid Verified Decoding: A New Paradigm for Speculative Decoding Acceleration in Agent Workflows
This article introduces Hybrid Verified Decoding (HVD), an optimized speculative decoding method for agent workflow scenarios. By learning to predict the expected acceptance length of cached drafts, it dynamically selects verification strategies (cached drafts or model drafters), solving the problem of uncertain benefits from parameter-free drafts. Experiments show that this method achieves an average speedup of 2.73x compared to EAGLE3 in agent workflow scenarios, providing a new path for optimizing LLM inference latency.