Zing Forum

Reading

delta-Mem: An Efficient Online Memory System for Large Language Models

The delta-Mem framework, developed by Declare Lab at the Singapore University of Technology and Design, addresses the context forgetting issue in long conversations with large language models (LLMs) through an incremental memory update mechanism. It significantly improves the coherence and accuracy of multi-turn dialogues while maintaining low computational overhead.

大语言模型记忆增强增量更新长对话LLMmemoryRAG新加坡科技设计大学
Published 2026-05-13 23:19Recent activity 2026-05-13 23:29Estimated read 9 min
delta-Mem: An Efficient Online Memory System for Large Language Models
1

Section 01

Introduction: delta-Mem—An Efficient Solution to LLM Long Conversation Memory Dilemma

The delta-Mem framework, launched by Declare Lab at the Singapore University of Technology and Design, targets the context forgetting problem faced by large language models (LLMs) in long conversations. It adopts an incremental memory update mechanism, significantly enhancing the coherence and accuracy of multi-turn dialogues while keeping computational overhead low. This framework provides an efficient and feasible solution for memory enhancement of LLMs.

2

Section 02

Background: Memory and Efficiency Challenges of LLM Long Conversations

Large language models (LLMs) face fundamental challenges from context window limitations when handling long conversations: as the number of dialogue turns increases, the demand for maintaining historical information grows. However, the computational complexity of traditional attention mechanisms for ultra-long sequences grows quadratically, leading to sharp increases in response latency and memory consumption, and easy forgetting of early important information. Existing solutions have shortcomings: expanding the context window is costly, most external memory mechanisms require full re-encoding which is inefficient, and efficient and reliable long-term memory has become a key bottleneck in LLM engineering.

3

Section 03

Core Innovations and Technical Architecture of delta-Mem

delta-Mem is an incremental online memory framework. Its core idea draws on database incremental update strategies, storing only the delta (change) of new information instead of rewriting the entire memory state. Its technical architecture consists of three key components:

  1. Memory Encoder: A lightweight encoding network compresses dialogue history into fixed-dimensional vectors, supporting incremental updates. A new dialogue segment generates a delta vector through a single forward pass;
  2. Memory Storage Layer: Uses vector databases like FAISS/Milvus to store memory embeddings. Each entry is attached with a timestamp and importance score, supporting hybrid retrieval based on semantic similarity and temporal decay;
  3. Memory Fusion Module: Dynamically retrieves relevant memory when generating responses and fuses it with the current context attention. It introduces a difference-aware mechanism to resolve conflicts between new and old information.
4

Section 04

Technical Principle of the Incremental Update Mechanism

Core operations of delta-Mem's incremental update mechanism: When new content is generated in the t-th dialogue turn, first vectorize the new text to get v_t, then calculate the difference Δ_t with similar entries in the existing memory bank. If the difference exceeds the threshold, store v_t as a new entry; otherwise, update the metadata (access frequency, last access time) of the existing entry. Experiments show that when processing 100 dialogue turns, the encoding overhead is only 12% of full re-encoding, and the retrieval accuracy remains above 95%, allowing real-time memory state maintenance without offline batch processing.

5

Section 05

Experimental Validation: Performance Advantages of delta-Mem

The research team evaluated delta-Mem on datasets including Multi-Session Chat, LongContext Benchmark, and a custom customer service dialogue dataset, comparing it with baseline methods like RAG, MemGPT, and Kosmos-2.5. The results show:

  • Retrieval Accuracy: In tests with 1000 historical dialogues, the recall rate of relevant memory is 92.3%, which is 8 percentage points higher than MemGPT;
  • Response Quality: Higher scores in information accuracy and context coherence in human evaluations;
  • Computational Efficiency: Single memory update latency ≤50ms, meeting real-time interaction requirements;
  • Memory Usage: Incremental compression reduces the memory growth rate during long-term operation by 60%;
  • Conflict Handling: Can identify information conflicts corrected by users, prioritizing the latest memory to avoid contradictory responses.
6

Section 06

Application Scenarios and Deployment Considerations

delta-Mem supports engineering deployment, providing integration interfaces for Hugging Face Transformers and vLLM, and is compatible with mainstream open-source models like Llama, Qwen, and ChatGLM. For production environments, Redis/PostgreSQL storage backends and Prometheus monitoring metric exporters are optional. Typical application scenarios include:

  • Intelligent Customer Service: Maintain customer historical work orders and preferences to provide personalized services;
  • Educational Tutoring: Track students' learning progress to adjust teaching strategies;
  • Personal Knowledge Management: Accumulate reading notes and support cross-time associative retrieval;
  • Code Development Assistant: Maintain project context to keep coding consistency.
7

Section 07

Limitations and Future Directions

delta-Mem has limitations: Memory encoder compression loses some semantic details, and the conflict resolution strategy relying on timestamps and access frequency is relatively simple. Future directions include: structured memory representation combined with knowledge graphs, unified memory framework for multi-modal inputs, and lightweight memory compression algorithms for edge devices. The project code and pre-trained checkpoints have been open-sourced on GitHub, and the accompanying paper elaborates on technical details and experimental settings, supporting reproduction and secondary development.