Reading

Adam's Law: Text Frequency Law Reveals LLMs Prefer "Common Expressions", Rewriting Inputs Can Improve Performance

The study proposes the Text Frequency Law (TFL), finding that LLMs are more sensitive to high-frequency text expressions. A three-step framework of input rewriting, frequency distillation, and curriculum training was validated effective in tasks like mathematical reasoning and translation.

文本频率Adam's LawTFLLLM优化提示工程课程学习输入改写频率蒸馏TFPD

Published 2026-04-02 23:39Recent activity 2026-04-03 09:23Estimated read 5 min

Adam's Law: Text Frequency Law Reveals LLMs Prefer "Common Expressions", Rewriting Inputs Can Improve Performance

Section 01

Introduction: Key Points of Adam's Law and Text Frequency Law (TFL)

The study proposes the Text Frequency Law (TFL), revealing that LLMs are more sensitive to high-frequency text expressions. It constructs a three-step optimization framework of input rewriting, frequency distillation, and curriculum training, which was validated effective in tasks such as mathematical reasoning, machine translation, commonsense reasoning, and agent tool calling. This finding provides a new direction for LLM optimization.

Section 02

Background: The Neglected Factor of Text Frequency

LLM research often focuses on architecture, data scale, etc., but text frequency has long been neglected. Psychology confirms that humans read high-frequency words faster (familiarity effect). The Adam's Law study aims to explore whether LLMs have a similar pattern, proposing TFL and constructing an optimization framework.

Section 03

Core Findings: TFL's Assertions and Frequency Estimation

Core of TFL: LLMs' prompts and fine-tuning should prioritize high-frequency expressions (since high-frequency expressions are encountered more in pre-training, the model understands them more fully). Due to closed-source training data, the team used online resources to estimate text frequency and solve statistical challenges.

Section 04

Three-Step Optimization Framework: From Theory to Practice

Three-step framework based on TFL:

Input Rewriting: Convert input into semantically equivalent high-frequency expressions (no model modification needed);
Frequency Distillation (TFD): Use LLM continuation to generate corpus for calibrating frequency estimation;
Curriculum Training for Frequency Tuning (CTFT): Fine-tune from low to high frequency (drawing on curriculum learning).

Section 05

Experimental Validation: Multi-Task Test Results

Constructed the TFPD dataset and validated on four tasks:

Mathematical reasoning: High-frequency rewriting improves problem understanding;
Translation: High-frequency expressions in the target language make results more authentic;
Commonsense reasoning: Reduces ambiguity and improves accuracy;
Tool calling: Clear instructions enhance reliability. All tasks achieved significant improvements.

Section 06

Technical Details: Key Implementation Considerations

Implementation considerations:

Frequency estimation granularity: Sentence level (balancing semantics and sparsity);
Rewriting quality: Maintain semantic equivalence (using similarity models or manual review);
Online resources: Choose corpora close to the distribution of training data (e.g., subsets of Common Crawl);
Curriculum strategy: Reasonably set frequency thresholds and training phases.

Section 07

Application Insights: Practical Suggestions for LLM Optimization

Research insights:

Prompt engineering: Add the dimension of expression frequency, use common expressions to improve performance;
Data preprocessing: Filter/rearrange fine-tuning data by frequency (curriculum-style organization);
Input optimization automation: Use a rewriter as a general preprocessing module (applicable to scenarios like chatbots).

Section 08

Limitations and Future Directions

Limitations and future directions:

Trade-off between frequency and quality: High-frequency expressions may lose professionalism;
Cross-language applicability: Need to verify effectiveness in non-English languages;
Impact of model scale: Whether ultra-large models are still affected by frequency effects;
Technical combination: Synergistic effects with CoT, few-shot learning, etc.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15