Zing Forum

Reading

Awex: A Reinforcement Learning Training and Inference Framework Enabling Second-level Weight Synchronization for Trillion-Parameter Models

Awex is an open-source high-performance reinforcement learning weight synchronization framework developed by InclusionAI. It supports full weight synchronization of trillion-parameter models in 10 seconds on a 1,000-GPU cluster, solving the parameter update latency issue between training and inference in RLHF training.

强化学习RLHF权重同步大语言模型分布式训练NCCLRDMAMegatronvLLM推理优化
Published 2026-04-10 21:40Recent activity 2026-04-10 21:45Estimated read 7 min
Awex: A Reinforcement Learning Training and Inference Framework Enabling Second-level Weight Synchronization for Trillion-Parameter Models
1

Section 01

[Main Floor/Introduction] Awex: An RL Training and Inference Framework Enabling Second-level Weight Synchronization for Trillion-Parameter Models

Awex is an open-source high-performance reinforcement learning weight synchronization framework developed by InclusionAI. Its core goal is to solve the parameter update latency issue between training and inference ends in reinforcement learning training such as RLHF. The framework has been validated on a 1,000-GPU cluster, supporting full weight synchronization of trillion-parameter models in 10 seconds, providing efficient collaboration capabilities for large-scale reinforcement learning training.

2

Section 02

Background: Weight Synchronization Bottleneck in RL Training

In the reinforcement learning training of large language models (such as RLHF, DPO, etc.), traditional weight synchronization methods require writing weights to the storage system first before loading by the inference end, which takes several minutes or even longer. This latency severely restricts algorithm iteration efficiency, especially in online RL scenarios where the inference end needs to frequently use the latest model to generate responses— the synchronization bottleneck significantly affects training throughput and convergence speed.

3

Section 03

Core Technical Features of Awex

Awex's core technical features include:

  1. Extreme Synchronization Speed: 0.8 seconds for 10-billion-parameter synchronization under NCCL mode, 20 seconds for trillion-parameter; only 6 seconds for trillion-parameter under RDMA mode;
  2. Unified Weight Adaptation Layer: Automatically handles parallel strategy and tensor layout differences between training (e.g., Megatron) and inference (e.g., vLLM) engines;
  3. Zero-Redundancy Transmission & In-Place Update: Only transmits necessary weight shards; inference end updates GPU memory in-place to avoid additional overhead;
  4. Multi-Mode Transmission Support: Compatible with high-speed interconnection technologies like NCCL, RDMA, and shared memory;
  5. Heterogeneous Deployment Compatibility: Supports co-located/separated deployment, adapting to the needs of synchronous and asynchronous RL algorithms.
4

Section 04

Awex Architecture Design and Core Workflow

Architecture Components

  • WeightWriter: Training nodes collect weight shard metadata, convert formats, and build transmission plans;
  • WeightReader: Inference instances receive weight data and complete local updates;
  • MetaServer: Global metadata exchange and coordination hub.

Weight Exchange Workflow

  1. Unified format conversion: Convert weights from different engines (Megatron, vLLM, etc.) into a standard format;
  2. Global metadata exchange: Collect shard metadata and report to MetaServer;
  3. P2P transmission plan construction: Generate peer-to-peer transmission plans based on metadata;
  4. Transmission execution: Use NCCL/RDMA for data transmission;
  5. Tensor-level verification: Compare weights from transmission and file loading to ensure correctness.
5

Section 05

Performance Verification and Application Scenarios

Awex's performance leads in benchmark tests and can effectively solve synchronization bottlenecks. Applicable scenarios include:

  • Online RLHF training: Need to frequently synchronize the latest model to generate high-quality training data;
  • Multi-round iterative optimization: Reduce training cycles in fast iteration scenarios;
  • Large-scale cluster training: Efficient collaboration in 1,000/10,000 GPU scale clusters;
  • Real-time inference services: Quickly deploy the latest model version in production environments.
6

Section 06

Summary and Outlook

Awex successfully solves the weight synchronization bottleneck in large-scale RL training through innovative architecture and efficient transmission mechanisms. Its second-level synchronization capability enables online reinforcement learning training of trillion-parameter models, providing solid support for the continuous optimization of large language models. In the future, as the scale of large models grows, such specialized optimized weight synchronization frameworks will play a more important role in the AI infrastructure field.