Reading

VeRL-Omni: A Reinforcement Learning Training Framework for Diffusion Models and Omni-Modal Generative Models

VeRL-Omni is a reinforcement learning training framework specifically designed for multi-modal generative models. It supports RL post-training for diffusion models (e.g., Qwen-Image, Wan2.2) and omni-modal models (e.g., Qwen3-Omni), enables efficient inference based on vLLM-Omni, and provides implementations of various RL algorithms and an asynchronous reward calculation mechanism.

VeRL-Omni强化学习扩散模型多模态生成RL训练框架Qwen-ImagevLLM-OmniFlowGRPO视频生成昇腾NPU

Published 2026-06-12 17:16Recent activity 2026-06-12 17:21Estimated read 6 min

VeRL-Omni: A Reinforcement Learning Training Framework for Diffusion Models and Omni-Modal Generative Models

Section 01

Introduction

VeRL-Omni is a reinforcement learning training framework specifically designed for multi-modal generative models. It supports RL post-training for diffusion models (e.g., Qwen-Image, Wan2.2) and omni-modal models (e.g., Qwen3-Omni). It enables efficient inference based on vLLM-Omni and provides various RL algorithms and an asynchronous reward calculation mechanism. The project is maintained by the verl-project, open-sourced on GitHub, and released on June 12, 2026.

Section 02

Background: Unique Challenges in RL Training for Multi-Modal Generative Models

RLHF/DPO techniques for LLMs have proven effective in improving model alignment. However, multi-modal generative models (image/video/audio generation, omni-modal understanding) have large architectural differences (multi-step iteration for diffusion models, different flow matching/autoregressive strategies), making existing RL frameworks difficult to adapt: complex inference processes, high latency in reward calculation, and large differences in modal preprocessing workflows, thus creating a demand for specialized frameworks.

Section 03

Core Architecture and Technical Features

Optimized Inference Backend: Adopts vLLM-Omni (a multi-modal extension of vLLM) to achieve high-throughput sample generation;
Asynchronous Reward Service: Supports HTTP Scorer interface, overlapping reward calculation with rollout to reduce waiting time;
Modular Training Backend: Supports VeOmni/FSDP2, allowing combination of parallel strategies (USP/TP/DP);
Stability Enhancement: Introduces mechanisms like rollout correction and deterministic rollout to address the instability issue in RL training of diffusion models.

Section 04

Supported Models and Algorithm Matrix

Qwen-Image (Text-to-Image): FlowGRPO (CPS/SDE), MixGRPO, GRPO-Guard, DiffusionNFT, DPO (all verified);
Wan2.2 (Text-to-Video): DanceGRPO (verified);
SD3.5 (Text-to-Image): DPO (verified);
LTX2.3 (Text-to-Video+Audio): FlowGRPO (in development);
BAGEL (Unified Understanding + Generation): FlowGRPO (in development);
HunyuanImage-3.0: MixGRPO, SRPO (planned);
Qwen3-Omni-Thinker (Omni-Modal): GSPO (in development).

Section 05

Performance Advantages and Domestic Hardware Support

Performance Improvement: In Qwen-Image FlowGRPO tests, end-to-end throughput is 25% higher than the diffusers implementation (due to optimizations like vLLM-Omni inference, FSDP2 training, and asynchronous reward calculation);
Domestic Hardware Support: Natively supports Ascend NPUs, provides quick start guides, lowering the threshold for multi-modal RL training on domestic chips.

Section 06

Application Scenarios and Practical Significance

Researchers: Stable and efficient baselines, reducing the threshold for reproduction;
Developers: Modular architecture for easy integration of new models/reward functions, with rich documentation and examples;
Enterprise Users: Performance optimizations and Ascend support reduce training costs, and asynchronous reward calculation adapts to external evaluation scenarios.

Section 07

Summary and Future Outlook

VeRL-Omni addresses the unique challenges in RL training for multi-modal generative models and provides comprehensive support. Its rich model-algorithm matrix, performance advantages, and domestic hardware compatibility make it an important tool in this field. The project integrates with the verl and vLLM-Omni ecosystems and is continuously updated (e.g., adding DiffusionNFT/DPO), which will play a key role in multi-modal AI applications.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23