Reading

MemAgent: A Reinforcement Learning-Based Memory Agent Framework for Ultra-Long Contexts

MemAgent trains memory agents via end-to-end reinforcement learning, enabling handling of ultra-long contexts up to 3.5 million tokens without modifying the model architecture, achieving over 95% accuracy in the 512K RULER test.

long contextmemory agentreinforcement learningRLVRagent workflow上下文窗口强化学习

Published 2026-05-12 23:41Recent activity 2026-05-12 23:48Estimated read 6 min

MemAgent: A Reinforcement Learning-Based Memory Agent Framework for Ultra-Long Contexts

Section 01

MemAgent: Introduction to the Reinforcement Learning-Based Memory Agent Framework for Ultra-Long Contexts

This article introduces the MemAgent framework, which trains memory agents via end-to-end reinforcement learning. It can handle ultra-long contexts up to 3.5 million tokens without modifying the model architecture, achieving over 95% accuracy in the 512K RULER test. It addresses the computational bottlenecks and information loss issues in long context processing at its core, opening up a new direction for long text processing.

Section 02

Challenges in Ultra-Long Context Processing

The context window length of large language models is a practical bottleneck. Existing extension techniques (e.g., positional encoding extrapolation, sliding window attention) have computational complexity that grows quadratically with sequence length, making processing million-level tokens extremely costly; simple truncation or chunking easily leads to cross-chunk information loss, affecting task performance.

Section 03

Core Architecture and Innovations of MemAgent

MemAgent trains memory agents via end-to-end reinforcement learning without modifying the underlying model architecture. Key innovations include: linear time complexity (resource consumption is linearly related to text length); Reinforcement Learning with Verifiable Rewards (RLVR) to optimize multi-turn context-independent dialogue workflows; excellent extrapolation capability (training on 8K can extrapolate to 32K, and after RL training, the performance loss for 3.5 million token QA is <5%). Its multi-turn context-independent dialogue framework allows agents to actively manage memory, and the asynchronous Agent framework (RayActor parallelism) avoids blocking.

Section 04

Performance Validation

MemAgent performs excellently in ultra-long context tasks: the 14B model handles 3.5 million token QA with almost no loss; the 7B model achieves over 95% accuracy in the 512K RULER test; extrapolating from 8K training context to 3.5 million tokens, the performance degradation is controlled within 5%, proving the architecture's effectiveness and the scalability of RL training.

Section 05

Deployment and Training Guide

Quick Deployment: For local use, use the vLLM service (example script: vllm serve BytedTsinghua-SIA/RL-MemoryAgent-14B --tensor_parallel_size 2 + python quickstart.py), or configure environment variables to connect to online models.

Training Framework: General end-to-end RL training, supporting multi-step Agent workflows. Data is processed using HotpotQA (synthesizing long-context multi-hop data, filtering samples that do not require context); models support the Qwen2.5-Instruct series (need to configure YaRN to activate long context); supports single/multi-node Ray cluster training.

Section 06

Application Scenarios and Significance

MemAgent can be applied to: document understanding (entire books, legal contracts), code analysis (global understanding of large codebases), scientific research (long papers/multi-document reviews), and dialogue systems (long-term memory of conversation history). Its release is a milestone in the field of long text processing, breaking through traditional context limitations.

Section 07

Summary and Community Contributions

MemAgent breaks through context length limitations via memory agent architecture and RL training; its linear complexity and extrapolation capability open up a new direction for long text processing. The project is built on verl, open-sourcing the training framework, evaluation tools, and pre-trained models (7B/14B), providing the community with a complete toolchain. Future plans include exploring multimodal extensions and more application scenarios.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15