# DARE: Alignment and Reinforcement Learning Execution Framework for Diffusion Large Language Models

> DARE is a supervised fine-tuning and reinforcement learning training framework specifically designed for diffusion large language models (dLLMs). It supports multiple RL algorithms and comprehensive evaluation, facilitating the development of the dLLM research community.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-11T17:32:57.000Z
- 最近活动: 2026-06-11T17:48:44.267Z
- 热度: 154.7
- 关键词: 扩散语言模型, 强化学习, 大语言模型, DARE, LLaDA, SDAR, 监督微调, 模型对齐, 开源框架, 机器学习
- 页面链接: https://www.zingnex.cn/en/forum/thread/dare-66a5861a
- Canonical: https://www.zingnex.cn/forum/thread/dare-66a5861a
- Markdown 来源: floors_fallback

---

## Core Introduction to the DARE Framework: An Alignment and Reinforcement Learning Execution Tool for Diffusion Large Language Models

DARE (Diffusion Large Language Models Alignment and Reinforcement Executor) is a framework developed and open-sourced on GitHub by the yjyddq team. Specifically designed for diffusion large language models (dLLMs), it provides capabilities for supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), and reinforcement learning (RL) training, along with comprehensive evaluation support. This framework aims to fill the gap where existing RL frameworks cannot directly adapt to dLLMs, facilitating the development of the dLLM research community. The project was released in June 2026, original link: https://github.com/yjyddq/DARE.

## Project Background and Motivation: Filling the Gap in RL Frameworks for Diffusion Language Models

In recent years, diffusion language models (dLLMs) have emerged as a new architectural paradigm, generating text through iterative denoising. However, most existing RL frameworks are designed for autoregressive models and cannot be directly applied to dLLMs. To address this issue, the DARE framework was developed to specifically adapt to the training and evaluation needs of dLLMs.

## Core Architecture of DARE and Supported Model Types

DARE consists of two core components:
1. **Training Framework**: Built on verl, supporting SFT, PEFT (e.g., LoRA), and multiple RL algorithms;
2. **Evaluation Framework**: Built on OpenCompass, providing fast inference acceleration, comprehensive benchmarking, and SGLang integration.
Supported model types cover two main lines:
- Masked diffusion models: LLaDA series (e.g., LLaDA-8B-Instruct), Dream models;
- Block diffusion models: SDAR series (e.g., SDAR-30B-A3B-Chat), LLaDA2.X series.

## Rich RL Algorithm Support: Covering Basic to Advanced Technical Routes

DARE implements a rich 'zoo' of RL algorithms, covering applicable scenarios for different models:
| Algorithm Name | Applicable Models |
|---------|---------|
| d1 | General |
| Coupled-GRPO | LLaDA/Dream |
| VRPO | LLaDA/Dream |
| MDPO | LLaDA/Dream |
| CJ-GRPO | LLaDA/Dream |
| BGPO | LLaDA2.X |
| SPG | SDAR Series |
| EBPO | SDAR/LLaDA2.X |
| d-TreeRPO | LLaDA/Dream |
Researchers can choose the appropriate algorithm based on their tasks.

## Technical Highlights and Usage Guide

**Technical Highlights**:
- Sequence parallelism: Split long sequences across multiple devices to extend context length;
- SGLang integration: Deeply optimized rollout and evaluation acceleration, with team-contributed PRs fixing sampling parameters;
- Multi-node training: Provides example configurations to support large-scale distributed training.
**Usage Guide**:
- Training environment: Create a DARE virtual environment, install requirements and flash-attn;
- Evaluation environment: Create an opencompass environment, install DARE/opencompass;
- SGLang: Install using the compatible PR branch;
- Training examples: Scripts for SFT (including PEFT), RL, and multi-node training are available.

## Evaluation System and Community Collaboration

**Evaluation System**: Supports HumanEval (code generation), mathematical reasoning (requires additional dependencies), and OpenCompass comprehensive benchmarks. Future plans include expanding to multi-modal evaluation.
**Community Collaboration**: The project is a work in progress; feedback and collaboration are welcome. Built on verl and OpenCompass, the open-source license ensures scalability.

## Practical Significance and Future Outlook of DARE

**Practical Significance**: Lowers the threshold for dLLM research (unified interface), promotes fair algorithm comparison, accelerates model iteration, and drives community collaboration.
**Future Outlook**: Plans to support more models and algorithms, and is expected to become a standard training framework in the dLLM field, supporting the development of this emerging direction.
