Section 01
Introduction / Main Floor: K-Token Merging: Compressing Sequences in Latent Embedding Space for Efficient Inference of Large Language Models
K-Token Merging is an innovative prompt compression method that merges consecutive token blocks in the latent embedding space. It significantly reduces input sequence length while maintaining model performance, opening up a new path for efficient inference of large language models.