Section 01
Introduction: MCPO — Multi-Domain Contrastive Policy Optimization Empowers Large Models with Cross-Domain Knowledge Sharing and Interference Elimination
This article introduces the MCPO (Multi-Domain Contrastive Policy Optimization) method, which transforms cross-domain interactions from harmful competition to beneficial transfer through a contrastive learning mechanism, solving the problem of domain interference in multi-domain learning for large reasoning models. It simultaneously improves reasoning capabilities across multiple domains such as mathematics, code, and logical reasoning, even outperforming single-domain training in some scenarios. The original author team is Maricalce, the paper was published on arXiv on May 25, 2026, and the code has been open-sourced.