Reading

MCore-Bridge: A Model Definition Library to Make Megatron-Core Training as Simple as Transformers

MCore-Bridge, launched by the ModelScope community, provides Megatron-Core model definitions for over 300 large language models (LLMs) and 200+ multimodal large models (MLLMs). It supports LoRA and full-parameter training, is compatible with the PEFT ecosystem, and simplifies distributed large model training to make it efficient.

Megatron-CoreModelScope大模型训练分布式训练LoRA多模态MoEPyTorchGPU训练Qwen

Published 2026-05-25 16:43Recent activity 2026-05-25 16:49Estimated read 7 min

MCore-Bridge: A Model Definition Library to Make Megatron-Core Training as Simple as Transformers

Section 01

Introduction: MCore-Bridge—An Open-Source Tool to Simplify Megatron-Core Large Model Training

MCore-Bridge, launched by the ModelScope community, provides Megatron-Core model definitions for over 300 large language models (LLMs) and 200+ multimodal large models (MLLMs). It supports LoRA and full-parameter training, is compatible with the PEFT ecosystem, bridges HuggingFace to Megatron-Core, and simplifies distributed large model training to make it efficient.

Section 02

Background: Engineering Challenges in Large Model Training

With the explosive growth of LLM and MLLM scales, training has extremely high requirements for engineering infrastructure. NVIDIA's Megatron-Core offers advanced parallel strategies like tensor parallelism and pipeline parallelism, which can efficiently utilize the computing power of multi-GPU clusters. However, its usage threshold is high: developers need to manually write complex model definition code, handle low-level details such as weight loading and distributed communication, and repeated wheel reinvention slows down research iteration speed.

Section 03

Birth and Positioning of MCore-Bridge

MCore-Bridge is developed and maintained by the ModelScope community, released on March 30, 2026. It aims to solve the pain points of using Megatron-Core, providing out-of-the-box Megatron-Core model definitions. Its core goal is to make Megatron training as simple as Transformers, and it is a complete engineering solution.

Section 04

Core Capabilities and Technical Architecture

Extensive Model Coverage

Supports over 300 pure-text LLMs (such as Qwen series, DeepSeek series, GLM series, etc.) and 200+ multimodal models (such as Qwen multimodal, Gemma4, GLM-4V, etc.).

Comprehensive Hardware Compatibility

Supports NVIDIA GPUs (A10/A100/H100/B200, etc.), domestic Ascend NPUs, compatible with CUDA 12.8/13.0 and PyTorch 2.0+.

Flexible Parallel Strategies

Inherits Megatron-Core's capabilities like tensor parallelism, pipeline parallelism, sequence parallelism, context parallelism, expert parallelism, virtual pipeline parallelism, etc.

Training Modes and Ecosystem Compatibility

Supports full-parameter training and LoRA training, fully compatible with the HuggingFace PEFT ecosystem, supports the safetensors weight format, and can seamlessly interface with inference frameworks like Transformers and vLLM.

Section 05

Cutting-Edge Features for Multimodal Training

Optimized for multimodal model requirements:

FP8 training support: Uses NVIDIA Hopper architecture FP8 precision to accelerate training and improve throughput
MTP (Multi-Token Prediction): Enhances model inference efficiency
No sequence padding: Eliminates memory waste from sequence alignment within batches
Packing feature: Packs multiple short sequences to improve GPU utilization

Section 06

Practical Usage Examples

Basic Model Loading and Saving

Initialize the distributed environment via code, download the model, convert the configuration, create the model, and load/save weights (example code omitted).

LoRA Fine-Tuning Example

Integrate with PEFT, define LoRA configuration, wrap the model, and save LoRA weights (example code omitted).

Section 07

Ecosystem Integration and Installation Guide

Deep Integration with ms-swift

Combined with ModelScope's ms-swift training framework, it retains ease of use while gaining distributed training performance, supporting multiple task types.

Dependency Requirements

Component	Minimum Version	Recommended Version
Python	>=3.10	3.12
PyTorch	>=2.0	2.8.0/2.11.0
megatron-core	>=0.15,<0.18	0.17.0

Installation Methods

pip installation: pip install mcore-bridge -U
uv accelerated installation: uv pip install mcore-bridge -U --torch-backend=auto
Source code installation: Clone the repository and run pip install -e .

Section 08

Summary and Outlook

MCore-Bridge combines the high performance of Megatron-Core with the ease of use of the Transformers ecosystem, allowing developers to focus on model innovation. Its extensive model support, hardware compatibility, and ecosystem integration make it a production-ready solution. In the future, it will continue to support new models (Day0 strategy) and play an important role in the field of large model infrastructure.

Continue Reading

Keep going with more reads from the same topic.

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

SignalCut is an innovative web application that analyzes brands' visibility gaps in AI search, automatically generates evidence-based marketing strategies, and creates Hera video materials, helping early-stage brands gain a competitive edge in the AI answer engine era.

Recent activity 2026-04-26 11:27

AWS Open-Sources AI Search Citation Analysis System: Track Brand Exposure in AI Search Engines

An open-source project officially released by AWS, built on Amazon Bedrock, Step Functions, and React to form a complete serverless citation analysis system. It helps enterprises monitor their brand's citation status and competitive landscape in AI searches like ChatGPT, Perplexity, Gemini, and Claude.

Recent activity 2026-03-31 20:49

Next.js Application SEO and GEO Integrated Optimization Solution: Comprehensive Visibility from Search Engines to AI Assistants

This article delves into the stevewerme/seo-geo-nextjs project, an open-source tool designed specifically for Next.js applications to simultaneously optimize traditional search engine rankings (SEO) and generative engine visibility (GEO). It analyzes the project's core architecture, implementation mechanisms, practical application scenarios, and its strategic significance for developers and content creators.

Recent activity 2026-04-03 14:48

Baiyuan GEO Platform Technical White Paper: SaaS Engineering Practice for Generative Engine Optimization (GEO)

This article deeply analyzes the GEO Platform technical white paper developed by Baiyuan Technology, covering the seven-dimensional AI citation rate scoring algorithm, AXP shadow document delivery mechanism, Schema.org three-layer entity knowledge graph, and the hallucination automatic detection and repair closed-loop system, providing an engineering solution for brands to gain visibility in generative AI such as ChatGPT and Claude.

Recent activity 2026-04-18 22:54