Zing Forum

Reading

ASEM: A Self-Evolving Memory Framework for Large Language Model Agents

A five-stage memory framework that enables LLM agents to achieve cross-session knowledge evolution while keeping the base model frozen, through structured atomic notes, a reinforcement learning-trained memory manager, and value-aware retrieval.

LLM agentsmemory frameworkRAGreinforcement learningGRPOself-evolvingretrieval
Published 2026-06-05 00:41Recent activity 2026-06-05 00:52Estimated read 5 min
ASEM: A Self-Evolving Memory Framework for Large Language Model Agents
1

Section 01

Introduction / Main Floor: ASEM: A Self-Evolving Memory Framework for Large Language Model Agents

A five-stage memory framework that enables LLM agents to achieve cross-session knowledge evolution while keeping the base model frozen, through structured atomic notes, a reinforcement learning-trained memory manager, and value-aware retrieval.

3

Section 03

Problem Background: The Dilemma of LLM Memory

Although large language models have strong reasoning capabilities, they face severe memory bottlenecks in long conversations and cross-session scenarios. Traditional context window limitations make it difficult for models to remember distant past information, while simple vector retrieval lacks judgment on the value of memories. More critically, most solutions require fine-tuning model parameters, which is costly in practical deployment. ASEM (Agentic Self-Evolving Memory) proposes a new solution: enabling the memory system itself to have learning capabilities instead of modifying the base model.

4

Section 04

Core Architecture: Five-Stage Memory Lifecycle

ASEM abstracts memory management into five collaborative stages, forming a complete cognitive loop.

5

Section 05

1. Multi-Attribute Atomic Notes

Unlike traditional plain text memories, ASEM encodes each memory into a multi-attribute structure, including keywords, tags, descriptions, and vector embeddings. This rich structured representation allows memories to be not only semantically retrieved but also precisely filtered based on metadata. For example, the system can retrieve "memories related to Python and with tags containing debug" instead of just similarity matching.

6

Section 06

2. Reinforcement Learning-Trained Memory Manager (GRPO)

This is the most innovative design of ASEM. Memory writing operations (when to write, what to write, how to organize) are controlled by a dedicated model trained via GRPO (Generalized Reward-Penalty Optimization). This model learns to evaluate the value of each potential memory and decides whether to store it in long-term memory, short-term cache, or discard it directly. Through reinforcement learning, the memory manager can adapt to the memory preferences of specific domains and users.

7

Section 07

3. Two-Stage Hybrid Retrieval and Value-Aware Re-ranking

The retrieval process is divided into two stages: first, candidate memories are recalled via vector similarity, then re-ranked by a value-aware module. This re-ranker considers factors such as the context of the current task, the historical usage frequency of memories, and timeliness to ensure the most relevant memories are prioritized for the LLM.

8

Section 08

4. Non-Parametric Utility Update (EMA)

ASEM uses Exponential Moving Average (EMA) to track the long-term utility of each memory without gradient updates. When a memory is successfully used (helping generate a good answer), its utility score increases; otherwise, it decreases. This lightweight update mechanism allows the memory system to evolve continuously without increasing inference overhead.