Section 01
InterleaveThinker: Guide to the Multi-Agent Framework for Text-Image Interleaved Generation
InterleaveThinker is an innovative multi-agent framework that enables existing image generators to perform text-image interleaved generation through collaboration between planner and critic agents. Optimized via GRPO reinforcement learning, this method achieves performance comparable to GPT-5 on interleaved generation benchmarks while significantly enhancing the reasoning task performance of base models. Keywords: Image generation, multi-agent, text-image interleaving, reinforcement learning, GRPO, visual narrative, multimodality. Original source: arXiv, June 11, 2026, link http://arxiv.org/abs/2606.13679v1.