Section 01
Introduction: ARM — An Autoregressive Multimodal Model Unifying Image Understanding, Generation, and Editing
ARM: Autoregressive Multimodal Model Based on Discrete Representation, Unifying Image Understanding, Generation, and Editing
Core Insights: ARM achieves the unification of image understanding, generation, and editing within a single autoregressive framework through a semantic visual tokenizer and reinforcement learning optimization, and discovers cross-task synergy effects. Original Author/Team: Paper author team (arXiv:2606.11188v1) Source Platform: arXiv Original Paper Link: http://arxiv.org/abs/2606.11188v1 Code Repository: https://github.com/wdrink/ARM Publication Date: June 9, 2026