Section 01
G²TR: Introduction to the Generation-Guided Visual Token Compression Framework
Core Overview of the G²TR Framework
G²TR is a generation-guided visual token compression framework for multi-modal models with separated encoders. It evaluates token importance via VAE latent space consistency to achieve balanced selection and redundant merging. Experiments show that this method can reduce visual tokens and pre-filling computation by 1.94x while maintaining inference accuracy and editing quality.