Reading

Doc2Atom: A Compositional Parametric Memory Framework Revolutionizing Long-Document Reasoning

This paper proposes Doc2Atom, which decomposes documents into semantically typed knowledge atoms and compiles them into independent micro-LoRA adapters to enable query-specific dynamic composition. It outperforms the Doc-to-LoRA baseline on six QA benchmarks while reducing the memory cost of document internalization.

上下文蒸馏LoRA长文档处理知识原子参数化记忆文档问答组合式推理内存优化LLM效率

Published 2026-06-11 01:58Recent activity 2026-06-11 11:30Estimated read 10 min

Doc2Atom: A Compositional Parametric Memory Framework Revolutionizing Long-Document Reasoning

Section 01

Introduction: Core Breakthroughs of Doc2Atom in Revolutionizing Long-Document Reasoning

Original Authors and Source

Original Authors/Maintainers: Paper author team (standard arXiv authorship)
Source Platform: arXiv
Original Title: Doc-to-Atom: Learning to Compile and Compose Memory Atoms
Original Link: http://arxiv.org/abs/2606.12400v1
Publication Time: 2026-06-10

Core Insights

This paper proposes the Doc2Atom compositional parametric memory framework, which decomposes documents into semantically typed knowledge atoms and compiles them into independent micro-LoRA adapters to achieve query-specific dynamic composition. This framework outperforms the Doc-to-LoRA baseline on six QA benchmarks while significantly reducing the memory cost of document internalization, revolutionizing the way long-document reasoning is done.

Section 02

Background: Challenges in Long-Document Processing and Limitations of Existing Methods

Computational Dilemma of Long-Document Processing

Large Language Models (LLMs) face a quadratic complexity bottleneck in their attention mechanism when processing long documents; as input sequences grow, computational and memory costs increase sharply.

Rise of Context Distillation

To address this issue, the "context distillation" method compresses document information into model parameters, avoiding long-sequence processing during inference. The core is to pre-internalize documents into parameters, and only load compressed representations during inference.

Limitations of Doc-to-LoRA

Doc-to-LoRA generates a document-specific LoRA adapter via a single forward pass, but has three major issues:

Irrelevant query interference: A single adapter mixes multi-topic information, leading to scattered answers or hallucinations;
Limited compositional recall: Difficult to combine multiple parts of information to handle complex queries;
Poor scalability for long documents: Information volume growth exceeds the capacity of a single adapter.

Section 03

Doc2Atom Framework: Knowledge Atomization and Dynamic Composition Design

Core Idea: Knowledge Atomization

Doc2Atom decomposes documents into knowledge atoms—semantically typed sub-units, each containing coherent concepts and semantic labels, which can be independently compiled into parameters and dynamically combined.

System Architecture

Document Decomposer: Segments documents into atoms based on semantics, annotates types, and optimizes boundaries;
Atom Compiler: Compiles each atom into a lightweight micro-LoRA adapter, associated with a source retrieval key;
Query Router: Analyzes queries, selects relevant atoms, and assembles a composite adapter to inject into the base model.

End-to-End Training

Trained via multi-objective distillation:

Atom quality: Ensure atoms accurately encode segment information;
Routing accuracy: Train the router to select relevant atoms;
Compositional ability: Handle multi-atom composition for complex queries;
Efficiency optimization: Minimize computational costs. Training data is automatically generated, including atom-question-answer pairs, complex queries, and negative samples.

Section 04

Experimental Validation: Performance and Efficiency Advantages of Doc2Atom

Benchmark Datasets

Validated on six QA benchmarks: Natural Questions, HotpotQA, MS MARCO, NarrativeQA, QASPER, DocRED.

Key Results

Performance improvement: Outperforms Doc-to-LoRA on all benchmarks, with an average increase of over 10% (e.g., HotpotQA +12.7%, NarrativeQA +15.2%);
Memory efficiency: Parameters for storing the same information are reduced by 40-60%, and only a few micro-LoRA adapters are loaded during inference, with more obvious advantages for long documents.

Ablation Studies

Atomization itself improves performance, proving decomposition reduces interference;
Dynamic routing further enhances performance;
Semantic type annotation contributes significantly (performance drops by 15% without annotation);
Micro-LoRA is more efficient than standard LoRA.

Section 05

In-depth Analysis: Sources of Doc2Atom's Effectiveness

Four Key Advantages

Information isolation: Atoms physically isolate irrelevant information, eliminating interference;
Compositional flexibility: Dynamic routing combines atoms on demand to handle simple/complex queries;
Parameter efficiency: Micro-LoRA requires only hundreds of parameters, with total parameters far lower than a single adapter;
Interpretability: Selected atoms can be viewed to understand the basis for the model's answers.

Section 06

Application Scenarios: Diverse Practical Domains of Doc2Atom

Core Application Scenarios

Enterprise knowledge base QA: Dynamically combine atoms for products, technologies, customer cases, etc.;
Legal document analysis: Adapt to structured atoms like contract clauses and precedents;
Academic paper assistant: Combine atoms for abstracts, methods, experiments, etc., on demand;
Multi-document reasoning: Unified indexing of cross-document atoms, supporting cross-document information combination.

Section 07

Limitations and Future Research Directions

Current Limitations

Decomposition quality: Automatic decomposition may be imprecise;
Type system: Predefined/learned type systems have limited coverage;
Routing errors: The router may select wrong atoms;
Training cost: End-to-end training requires large resources.

Future Directions

Adaptive decomposition: Learn optimal decomposition strategies;
Hierarchical atoms: Support hierarchical structures from chapters → paragraphs → sentences;
Cross-document association: Identify semantic associations between atoms from different documents;
Incremental updates: Support partial updates of documents;
Multimodal extension: Cover multimodal documents like images and tables.

Section 08

Conclusion: Implications of Doc2Atom for Long-Document Reasoning

Doc2Atom represents an important advancement in the field of context distillation, solving the fundamental limitations of monolithic adapters through atomization and dynamic composition. Its "LEGO brick"-style information organization approach opens up new possibilities for long-document reasoning. As LLMs expand their applications in knowledge-intensive tasks, Doc2Atom will become a key infrastructure for efficiently utilizing massive document information.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23