Section 01
【Introduction】JD Open-Sources JoyAI-Image: Unified Multimodal Model Enables Closed-Loop Collaboration of Image Understanding, Generation, and Editing
JoyAI-Image, open-sourced by JD, is a 24B-parameter unified multimodal foundation model. It deeply integrates three core capabilities—image understanding, text-to-image generation, and instruction-guided image editing—via a collaborative architecture combining an 8B multimodal large language model (MLLM) and a 16B multimodal diffusion Transformer (MMDiT), forming an "Understand-Generate-Edit" closed loop. The model boasts advantages like strong spatial understanding, long text rendering, and controllable spatial editing, and is open-sourced under the Apache-2.0 license.