Section 01
[Introduction] Image Refinement via Regeneration: The RvR Framework Enhances Unified Multimodal Model Performance
This paper proposes the RvR framework, which transforms image refinement from an editing paradigm (RvE) to a conditional regeneration paradigm. The core is to use semantic tokens instead of pixel-level retention to guide generation, achieving performance improvements (0.78→0.91, 84.02→87.21, 61.53→77.41) on the three major benchmarks Geneval, DPGBench, and UniGenBench++ respectively. This framework breaks through the limitations of traditional editing paradigms and brings significant improvements to the image refinement capabilities of unified multimodal models.