Section 01
AlphaGRPO: Unlocking Self-Reflective Generation Capabilities of Multimodal Models via Decomposable Verifiable Rewards (Introduction)
AlphaGRPO applies GRPO to autoregressive diffusion unified multimodal models. It solves the reward signal challenge in open-domain image generation via a decomposable verifiable reward mechanism, enabling inferential text-to-image generation and self-reflective optimization. It achieves significant improvements on multiple multimodal generation benchmarks, providing a new direction for the development of multimodal AI.