With the rapid development of large language model technology, we face a fundamental question: What system can be called a "large language model"? This question seems simple, but it is actually complex. Architectures like Transformer, MoE, Mamba, and RWKV emerge one after another, each generation changing the implementation method, but what exactly is the essence of an LLM?
Existing definitions often rely on vendors' marketing terms or vague technical descriptions. OpenAI says an LLM is a text-to-text system that predicts subsequent text; Anthropic emphasizes high parameter counts and human-like text generation capabilities; Meta's Llama uses autoregressive language models and optimized Transformer architectures. While these descriptions provide direction, they lack a strict, verifiable boundary.
This is the background of the Layer-0 Theorem. It attempts to establish a mathematical functional necessity boundary for LLMs—not based on specific architectures, but on six core functional roles that any LLM must possess.