Section 01
UNO Framework: Enhancing Visual Generation Capabilities of Unified Multimodal Models with Understanding Supervision
To address the decoupling issue between understanding and generation components in unified multimodal models, this paper proposes UNO, an understanding-guided post-training framework. This framework uses understanding tasks as direct supervision signals for generation, and verifies that understanding capabilities enhance generation quality in image generation and editing tasks, providing a new path for the collaborative enhancement of unified multimodal models.