Reading

In-Place Test-Time Training: Enabling Large Language Models to Evolve During Inference

This article proposes the In-Place TTT framework, which enables large language models (LLMs) to dynamically update parameters during inference by using the final projection matrix of MLP blocks as adaptable fast weights and designing an objective function optimized for autoregressive language modeling. Experiments show that this method allows a 4B-parameter model to achieve excellent performance on tasks with up to 128k context, opening a new path for the continuous learning of LLMs.

Test-Time TrainingLLM持续学习快速权重Transformer动态适应推理时训练

Published 2026-04-08 01:59Recent activity 2026-04-08 10:51Estimated read 7 min

In-Place Test-Time Training: Enabling Large Language Models to Evolve During Inference

Section 01

【Introduction】In-Place TTT: A New Framework for Enabling LLMs to Evolve During Inference

This article proposes the In-Place Test-Time Training (TTT) framework, which enables large language models (LLMs) to dynamically update parameters during inference by using the final projection matrix of MLP blocks as adaptable fast weights and designing an objective function optimized for autoregressive language modeling. Experiments show that a 4B-parameter model achieves excellent performance on tasks with up to 128k context, opening a new path for the continuous learning of LLMs.

Section 02

Background: Limitations of Static LLMs and Challenges of TTT

The current mainstream paradigm for LLMs is 'train first, then deploy', where static models cannot dynamically adjust based on new information. Test-Time Training (TTT) allows updating fast weights during inference to adapt to new contexts, but applying existing TTT to LLMs faces three major obstacles: architectural incompatibility (requires specific design, incompatible with Transformers), low computational efficiency (high overhead from gradient updates during inference), and misaligned objective functions (traditional reconstruction objectives are not aligned with autoregressive language modeling tasks).

Section 03

Method: Three Key Design Innovations of In-Place TTT

The core innovations of In-Place TTT include:

Plug-and-play Fast Weights: Selecting the final projection matrix of MLP blocks as fast weights, which has the advantages of architecture agnosticism, high parameter efficiency, and plug-and-play, without modifying the existing Transformer structure.
Theoretically Driven Objective Function: Designed for autoregressive language modeling, explicitly considering local context dependencies, long-range consistency, and stability constraints, directly optimizing the accuracy of next-token prediction.
Efficient Block-wise Update Mechanism: Splitting long texts into blocks, updating fast weights independently, reducing memory requirements, supporting parallelization, and maintaining cross-block coherence.

Section 04

Experiments: Validation of In-Place TTT's Effectiveness

The research team verified the effect through two groups of experiments:

Plug-and-play Enhancement Experiments: Applied to a 4B-parameter pre-trained model, it significantly improved performance on long document understanding (128k tokens), few-shot learning, and domain adaptation tasks, even surpassing baseline models with larger parameter sizes.
From-Scratch Pre-training Experiments: Models using this mechanism outperformed comparison methods in language modeling perplexity and downstream task performance, with more stable training.
Ablation Study: Using MLP projection matrices as fast weights, the new objective function, and medium block sizes (512-1024 tokens) yielded the best results.

Section 05

Technical Details: Computational Overhead and Compatibility

The computational overhead of In-Place TTT is manageable: time latency increases by 20-30%, memory usage increases by 10-15%, and the overhead grows sublinearly with sequence length. It is also compatible with various LLM optimization techniques, such as INT8/INT4 quantization, speculative decoding, and KV caching, without additional caching requirements.

Section 06

Applications: Potential Valuable Scenarios for In-Place TTT

The application scenarios of this framework include:

Personalized Assistants: Adjusting style preferences in real time based on user interaction history.
Long Document Analysis: Accurately answering questions that synthesize the full text in fields like law and finance.
Continuous Learning: Adapting to new data through local updates after deployment, without the need for full retraining.
Edge Device Deployment: Only updating a small number of parameters, suitable for local adaptation on resource-constrained devices.

Section 07

Limitations and Outlook: Next Steps for In-Place TTT

Current limitations: Update stability needs optimization, multi-turn dialogue state management remains to be solved, and the update process lacks interpretability. Future directions: Exploring hierarchical adaptation strategies, combining with meta-learning, extending to multimodal architectures, and conducting in-depth theoretical analysis of the dynamic characteristics of fast weights.

Section 08

Conclusion: Towards a New Paradigm of Dynamic Intelligence

In-Place TTT represents an important direction for LLMs to shift from static 'train-deploy' to dynamic 'continuous adaptation', endowing models with the ability to evolve during inference. It is not only a technical solution but also inspires that future AI systems should learn and adapt through interaction like humans, and is expected to become one of the core technologies of next-generation intelligent systems.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15