Zing Forum

Reading

MAny Framework: Solving the Dual Forgetting Problem in Multimodal Continual Learning

The research team proposes the MAny framework, which addresses the problems of perceptual drift and reasoning collapse in multimodal large models during continual learning through two core mechanisms: cross-modal projection fusion and parameter fusion. This framework requires no additional training and achieves a maximum accuracy improvement of 8.57% over state-of-the-art (SOTA) methods on the UCIT benchmark.

多模态大模型持续学习灾难性遗忘MAny框架知识融合
Published 2026-04-15 23:57Recent activity 2026-04-16 10:18Estimated read 6 min
MAny Framework: Solving the Dual Forgetting Problem in Multimodal Continual Learning
1

Section 01

MAny Framework: A New Solution to the Dual Forgetting Problem in Multimodal Continual Learning

The research team proposes the MAny (Merge Anything) framework, which specifically addresses the dual forgetting problem (perceptual drift and reasoning collapse) in multimodal large models during continual learning through two core mechanisms: Cross-Modal Projection Fusion (CPM) and Low-Rank Parameter Fusion (LPM). This framework requires no additional training and completes knowledge fusion solely via efficient CPU algebraic operations. It achieves a maximum accuracy improvement of 8.57% over existing SOTA methods on the UCIT benchmark, providing a practical and efficient solution for multimodal continual learning.

2

Section 02

Dilemma of Multimodal Continual Learning: Analysis of the Dual Forgetting Phenomenon

The sequential task adaptation of Multimodal Large Language Models (MLLMs) relies on Multimodal Continual Instruction Tuning (MCIT), but catastrophic forgetting is a key limiting factor. Existing studies mostly focus on the language reasoning backbone, yet ignore the dual forgetting phenomenon: perceptual drift in the cross-modal projection space (drift of vision-language aligned features) and reasoning collapse in the low-rank parameter space (mutual interference of reasoning parameters) occur simultaneously. Traditional methods struggle to handle this double blow, leading to unstable continual learning performance.

3

Section 03

Core Mechanisms of the MAny Framework: Cross-Modal Projection and Low-Rank Parameter Fusion

The MAny framework achieves effective task knowledge fusion through two mechanisms:

  1. Cross-Modal Projection Fusion (CPM):Using visual prototypes as anchors, it adaptively fuses cross-modal visual representations, aligns visual features of old and new tasks, and mitigates perceptual drift;
  2. Low-Rank Parameter Fusion (LPM):Recursively merges low-rank weight matrices, uses recursive least squares to provide a closed-form solution, eliminates interference between task-specific low-rank modules, ensures reasoning stability, and solves the reasoning collapse problem.
4

Section 04

Paradigm Innovation of the MAny Framework: Training-Free Efficient Knowledge Fusion

A key highlight of the MAny framework is its training-free feature: unlike traditional methods that require additional gradient optimization, MAny completes knowledge fusion solely via efficient CPU algebraic operations. After initial fine-tuning, no additional training overhead is needed, reducing computational costs while maintaining the model's adaptability to new tasks without sacrificing the performance of old tasks.

5

Section 05

Experimental Validation: Significant Performance Improvement of the MAny Framework on the UCIT Benchmark

The research team evaluated the MAny framework on multiple MLLMs and benchmark tests: On the UCIT benchmark, MAny achieved maximum final average accuracy improvements of 8.57% and 2.85% on two different MLLMs, significantly outperforming existing SOTA methods. It also demonstrated good cross-model transferability, stably functioning on different multimodal architectures, which proves the method's universality.

6

Section 06

Technical Contributions: Methodological Significance and Theoretical Value of the MAny Framework

The technical contributions of the MAny framework include:

  1. For the first time, systematically revealing the dual forgetting problem in multimodal continual learning, providing a theoretical perspective for subsequent research;
  2. The collaborative design of CPM and LPM provides a scalable knowledge fusion paradigm that balances perceptual feature alignment and reasoning parameter stability;
  3. The training-free implementation proves the potential of algebraic operations in knowledge fusion, opening a new path for efficient model updates.
7

Section 07

Application Prospects: Potential and Future Outlook of the MAny Framework in Real-World Scenarios

The MAny framework provides a practical and efficient solution for the continual learning of MLLMs, suitable for scenarios requiring frequent knowledge base updates (e.g., personalized assistants, domain-specific models). It can quickly adapt to new tasks without sacrificing historical performance. In the future, it is expected to integrate with more model architectures, expand to more complex multimodal tasks, and promote the development of continual learning technology.