# AMD-Proj: An Adaptive Memory-Driven Selective Gradient Projection Method for Continual Learning in Document Understanding

> This article introduces AMD-Proj, a novel framework for continual learning in the field of document understanding. Through an adaptive memory-driven selective gradient projection mechanism, this method prevents catastrophic forgetting while maintaining model plasticity, effectively addressing the stability-plasticity dilemma faced by multimodal document understanding models when sequentially learning new tasks.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-04-23T00:00:00.000Z
- 最近活动: 2026-04-25T10:24:19.225Z
- 热度: 96.6
- 关键词: 持续学习, 文档理解, 梯度投影, 灾难性遗忘, 多模态学习, LayoutLM, 自适应记忆, 参数高效微调, 视觉文档理解, Transformer模型
- 页面链接: https://www.zingnex.cn/en/forum/thread/amd-proj
- Canonical: https://www.zingnex.cn/forum/thread/amd-proj
- Markdown 来源: floors_fallback

---

## AMD-Proj: Introduction to the New Framework for Continual Learning in Document Understanding

This article introduces AMD-Proj, a novel framework for continual learning in the field of document understanding. Through an adaptive memory-driven selective gradient projection mechanism, this method prevents catastrophic forgetting while maintaining model plasticity, effectively addressing the stability-plasticity dilemma faced by multimodal document understanding models when sequentially learning new tasks.

## Background and Challenges of Continual Learning in Document Understanding

Document understanding is a core direction in the interdisciplinary field of computer vision and natural language processing, applied in scenarios such as invoice parsing and form recognition. However, the problem of 'catastrophic forgetting' exists in continual learning—traditional fine-tuning can degrade the performance of previous tasks. Existing continual learning methods (e.g., EWC, LwF) perform well in general visual tasks, but document understanding involves tight coupling between visual layout and text semantics, requiring higher multimodal fusion, so existing methods face unique challenges.

## Core Ideas of the AMD-Proj Method

AMD-Proj organically combines 'memory' and 'gradient projection'. Its core innovation is the adaptive memory-driven selective gradient projection mechanism. Traditional gradient projection methods use fixed strategies, while AMD-Proj maintains memory representations for each learned task (recording parameter directions, task importance, and relationships). It adaptively selects the parameter subspaces to protect based on factors such as the similarity between the current task and historical tasks, improving parameter utilization efficiency and balancing stability and plasticity.

## In-depth Analysis of the AMD-Proj Technical Mechanism

### Hierarchical Gradient Projection Strategy
For different layers of Transformer document understanding models (e.g., LayoutLMv2/v3), independent parameter subspaces are maintained. Shallow layers (low-level features) retain high plasticity, while deep layers (high-level semantics) are more protected, achieving refined control.
### Truncated SVD and Spectral Analysis
Truncated SVD is used to approximate parameter subspaces, reducing storage and filtering noise; spectral analysis is used to determine task complexity and specificity, assisting gradient projection decisions.
### Task Incremental Learning Setup
Optimized for task incremental scenarios, the model sequentially learns clearly defined tasks (e.g., different document types) and uses task identity signals for adaptive decision-making.

## Experimental Validation and Result Analysis of AMD-Proj

### Evaluation Datasets and Benchmarks
Evaluated on four datasets: FUNSD (forms), CORD (receipts), SROIE (invoices), and BuDDIE (business documents). Compared with classic methods (EWC, LwF), document understanding-specific methods (CUBER), and original gradient projection methods (GPM, TRGP).
### Key Findings
AMD-Proj significantly outperforms existing methods in F1 scores across all datasets, with an average improvement of 3-5 percentage points; it has strong anti-forgetting ability, with extremely low performance decay for the earliest tasks.
### Ablation Experiments
Removing the adaptive selection strategy leads to decreased parameter efficiency; removing the memory mechanism causes severe forgetting; hierarchical projection is superior to the global strategy.

## Practical Application Value and Deployment Considerations of AMD-Proj

### Enterprise Document Automation
Supports progressive learning of new document types, avoiding model fragmentation or high costs of retraining, and reducing system maintenance costs.
### Parameter Efficiency and Computational Overhead
Through truncated SVD and selective projection, additional storage requirements are low; there is no extra computational overhead during inference, so latency is not increased.
### Interpretability and Controllability
Understand task representations through the structure of memory subspaces; provides manual intervention interfaces (e.g., adjusting task weights) to meet the needs of high-risk scenarios.

## Limitations and Future Outlook of AMD-Proj

### Limitations
Currently targeted at task incremental learning scenarios; effectiveness in class/domain incremental scenarios remains to be verified; assumes similar task importance and does not incorporate explicit priority control.
### Future Directions
Combine with parameter-efficient fine-tuning techniques (e.g., LoRA, Adapter); extend to continual learning for multimodal large models (e.g., GPT-4V, Gemini); explore task priority control mechanisms.