Reading

Know2Say: Unveiling the Gap Between What Reasoning Models 'Know' and What They 'Say'

A study revealing the 'Detection-Extraction Gap' in the reasoning process of large language models, which achieves adaptive early exit via a black-box method, reducing reasoning costs by 70-85% while improving accuracy.

大语言模型推理优化提前退出链式思维CoT黑盒优化BAEE模型效率AI研究

Published 2026-04-25 08:12Recent activity 2026-04-25 08:26Estimated read 5 min

Know2Say: Unveiling the Gap Between What Reasoning Models 'Know' and What They 'Say'

Section 01

Know2Say Research Guide: Unveiling the Detection-Extraction Gap in Reasoning Models and Optimization Solutions

The Know2Say study focuses on the 'Detection-Extraction Gap' phenomenon in the reasoning process of large language models—models internally 'know' the answer early in reasoning, but are prone to errors when forced to extract it immediately. Based on this, the study proposes the Black-box Adaptive Early Exit (BAEE) strategy, which reduces reasoning costs by 70-85% while improving accuracy, and is applicable to closed-source models like GPT-4.

Section 02

Research Background: Efficiency Dilemma of Large Language Model Reasoning

As the complex reasoning capabilities of large language models improve, Chain-of-Thought (CoT) prompting has become a standard technique, but it comes with high computational costs due to numerous intermediate steps. Core question: Does the model need to generate all steps to 'know' the answer? The Know2Say study found that models form answers internally early on, but are prone to errors when forced to answer immediately—this is defined as the 'Detection-Extraction Gap'.

Section 03

Key Findings: Definition and Modeling of the Detection-Extraction Gap

Detection-Extraction Gap refers to: In the early stages of CoT, models can detect the high probability of an answer through free continuation (detection), but have low accuracy when forced to answer immediately (extraction). Researchers formalized the modeling using total variation distance (lower bound of the distance between the free continuation distribution P_free and the forced extraction distribution P_forced), providing a mathematical foundation for improvement strategies.

Section 04

BAEE Strategy: Black-box Adaptive Early Exit Mechanism

The core mechanisms of BAEE (Black-box Adaptive Early Exit) are: 1. Pause generation at preset checkpoints in CoT; 2. Sample continuations to check answer consistency (PSC metric); 3. If consistency exceeds a threshold (e.g., 0.75), exit early and return the majority answer. Its black-box nature does not require access to the model's internals, making it applicable to closed-source models like GPT-4 and Claude.

Section 05

Experimental Results: Efficiency and Accuracy Improvements of BAEE

In benchmark tests like MATH-500 and GPQA-Diamond, BAEE achieves: 70-85% reduction in sequence generation, 1-5 percentage points improvement in accuracy, and 52-88% of tokens accounted for after the commitment point. Overly long CoT may cause the model to deviate from the correct path; appropriate early exit instead maintains a clear reasoning main thread.

Section 06

Research Significance: Dual Contributions to Theory and Practice

Theoretical Contributions: Revealing the asymmetry between internal knowledge and external expression, the optimality of CoT length, and the feasibility of black-box optimization. Practical Value: Providing developers with ready-to-use optimization solutions to reduce API costs, improve latency, and enhance quality. Methodologically, it demonstrates a rigorous paradigm from phenomenon observation to practical solutions.

Section 07

Limitations and Future Directions: Improvement Spaces for Know2Say

Current Limitations: PSC threshold requires task-specific tuning, additional overhead from sampling costs, reduced benefits for complex long-dependency tasks. Future Directions: Adaptive threshold strategies, efficient PSC estimation, multimodal expansion, white-box version of early exit mechanisms.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49