Reading

MemTrace: An Open-Source Framework for Tracking Memory System Errors in Large Language Models

MemTrace is an open-source LLM memory system debugging framework developed by the NLP team at Zhejiang University. It converts memory processes into executable memory evolution graphs to enable fine-grained operation-level error attribution and supports automatic prompt optimization to improve task performance.

LLMmemory systemerror tracingdebuggingMemTraceZJUNLPRAGMem0EverMemOSprompt optimization

Published 2026-06-09 17:40Recent activity 2026-06-09 17:48Estimated read 6 min

MemTrace: An Open-Source Framework for Tracking Memory System Errors in Large Language Models

Section 01

[Introduction] MemTrace: An Open-Source Debugging Framework for Tracking LLM Memory System Errors

MemTrace is an open-source LLM memory system debugging framework developed by the NLP team at Zhejiang University (ZJUNLP). Its core function is to convert memory processes into executable memory evolution graphs (operation-variable execution graphs) to enable fine-grained operation-level error attribution and support automatic prompt optimization to improve task performance. The framework was open-sourced on June 9, 2026, with the corresponding paper submitted on May 27, 2026. The code repository is on GitHub (https://github.com/zjunlp/MemTrace), and the paper is available at https://arxiv.org/abs/2605.28732.

Section 02

Background: Core Pain Points in LLM Memory System Debugging

The LLM memory system is a key component supporting long-range reasoning and multi-turn dialogue, covering solutions like RAG, Mem0, and EverMemOS. However, error localization is challenging: errors may stem from omitted fact extraction, overwritten memory updates, irrelevant retrieval, or generation understanding biases. Traditional logs only present text-level call records and cannot reveal data dependencies and information flow paths between operations. MemTrace aims to solve this problem by converting memory execution processes into traceable structured graphs.

Section 03

MemTrace Core Architecture: Operation-Variable Execution Graph

The core innovation of MemTrace is the Operation-Variable Execution Graph:

Variables: Represent data entities such as user messages, extracted facts, stored memories, and retrieval results
Operations: Represent computational steps like fact extraction, memory update, retrieval, and generation The framework includes four components:

Smartcomment tracking layer (non-intrusively records execution graphs)
MemTraceBench benchmark dataset (covers labeled failure cases for four types of memory systems)
Graph-level automatic attribution algorithm (locates faulty operations and error types)
Diagnostic report and automatic optimization module (outputs suggestions and optimizes prompts)

Section 04

Error Attribution Mechanism: Counterfactual Analysis for Root Cause Localization

MemTrace adopts an iterative subgraph tracing strategy: it reversely traverses the execution graph from the output node and uses counterfactual analysis to evaluate the impact of operations on the final error (whether changing the operation output can correct the answer), distinguishing root cause operations from downstream propagation errors. Research shows that memory system failures are mostly systemic issues, such as information loss, retrieval misalignment, and update conflicts.

Section 05

Automatic Optimization: From Attribution to Performance Improvement

MemTrace uses attribution signals to targetedly optimize prompts (e.g., enhancing fact extraction guidance, improving retrieval relevance judgment) to form a closed-loop optimization mechanism. Experimental results show that this mechanism can significantly improve end-to-end task performance by up to 7.62% without manual intervention.

Section 06

Quick Start and Ecosystem Integration

MemTrace supports installation via pip/uv and requires Python ≥3.12; it has built-in MemTraceBench dataset loading and AgentScope Studio visualization interface; it provides ready-to-use integration with MemBase—MemBase users can track the memory lifecycle via smartcomment and generate execution graph data.

Section 07

Conclusion and Outlook: Promoting Memory System Controllability

MemTrace is an important advancement in the field of LLM memory system observability. It converts black-box memory processes into white-box execution graphs, providing developers with debugging and optimization capabilities. As LLMs become popular in scenarios like long conversations and personalized assistants, the reliability of memory systems becomes increasingly critical. MemTrace is expected to become infrastructure in this field, pushing memory systems from 'usable' to 'controllable'.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23