Zing Forum

Reading

DARE: A Training Framework for Alignment and Reinforcement Learning of Diffusion Large Language Models

A flexible and efficient training framework designed specifically for diffusion large language models (dLLMs), supporting supervised fine-tuning, reinforcement learning, and comprehensive evaluation to advance dLLM technology from research to practical applications.

扩散模型大语言模型强化学习监督微调LLaDA训练框架
Published 2026-04-13 02:57Recent activity 2026-04-13 03:22Estimated read 5 min
DARE: A Training Framework for Alignment and Reinforcement Learning of Diffusion Large Language Models
1

Section 01

DARE Framework: Infrastructure for Training and Evaluation of Diffusion Large Language Models

DARE is the first systematic training and evaluation platform for diffusion large language models (dLLMs), designed specifically to address the unique challenges in dLLM training optimization. It supports training methods such as supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), and reinforcement learning (RL), integrates inference acceleration and a comprehensive evaluation system, aiming to lower the threshold for dLLM research and application and promote the transition of the technology from academia to practical use.

2

Section 02

The Rise and Challenges of Diffusion Models and dLLMs

Since ChatGPT led the LLM boom in 2022, autoregressive architectures have dominated the market, but diffusion models (originating from the image domain) are changing the landscape. dLLMs adopt a 'coarse-to-fine' multi-step denoising generation mode, with advantages such as parallel generation, flexible editing, and global consistency (models like LLaDA, Dream, and SDAR have proven their potential). However, traditional autoregressive training methods cannot be directly transferred, and dLLM training optimization faces unique challenges—thus the DARE framework was born.

3

Section 03

Technical Architecture and Core Capabilities of DARE

DARE adopts a modular architecture, with core capabilities including: 1. Basic Training: Supports SFT (full parameter/PEFT), RL (online RL, Coupled-GRPO and other optimization algorithms), preference optimization (MDPO, VRPO); 2. Inference Acceleration: Block caching (2.2x rollout acceleration), integration with lmdeploy/SGLang (2-4x acceleration), sequence parallelism (extending generation length); 3. Attention Optimization: Supports FlashAttention series backends to reduce computational overhead.

4

Section 04

Model Families Supported by DARE and Evaluation System

DARE supports three major dLLM families: 1. Masked Diffusion Models (LLaDA 8B Instruct and 2.X series, Dream7B Instruct); 2. Block Diffusion Models (SDAR 8B Chat/30B A3B Chat, LLaDA2.0). The evaluation system is based on OpenCompass, covering dimensions such as knowledge ability (MMLU/C-Eval), mathematical reasoning (GSM8K/MATH + verification tools), code ability (HumanEval/MBPP), and reasoning planning (BBH), taking into account the specific characteristics of dLLMs.

5

Section 05

Latest Updates and Community Value of DARE

DARE has been continuously iterated since its release in December 2025. The March 2026 update includes support for d-TreeRPO/BGPO/EBPO algorithms, fixing SDAR issues, and supporting sequence parallelism, etc. Its significance lies in: lowering the entry threshold for dLLMs, allowing researchers to focus on algorithm innovation; promoting research standardization and reproducibility; and its modular design encouraging community contributions to drive ecosystem building.

6

Section 06

Multimodal Expansion and the Potential of dLLMs

DARE's roadmap extends to multimodal/full-modal, leveraging the advantages of diffusion architecture in image/audio/video generation to build a unified multimodal generation model. Although autoregressive models still dominate, the unique advantages of dLLMs (parallel generation, flexible control) give them great potential. As an infrastructure, DARE will help mature dLLM technology, and the community can participate in contributions to jointly promote its development.