Section 01
[Introduction] Hierarchical Language Models: Provable Trade-off Between Context Length and Reasoning Ability
This study provides the first rigorous mathematical proof for the 'value of reasoning' through theoretical analysis of synthetic languages. The results show: traditional autoregressive models need linear context length to accurately sample hierarchically structured languages; models with reasoning capabilities only require logarithmic working memory to achieve the same effect, providing theoretical guidance for the design of next-generation LLM architectures.