Zing Forum

Reading

LPSR: A New Inference-Time Error Correction Method Without Fine-Tuning

LPSR monitors phase shifts in residual streams to detect and roll back errors in real time during inference, significantly improving the performance of large language models on mathematical reasoning tasks without any fine-tuning or additional training.

大语言模型推理优化KV缓存错误修正残差流推理时计算数学推理相位偏移检测
Published 2026-04-21 01:53Recent activity 2026-04-21 12:18Estimated read 7 min
LPSR: A New Inference-Time Error Correction Method Without Fine-Tuning
1

Section 01

Introduction: LPSR—A New Inference-Time Error Correction Method Without Fine-Tuning

LPSR (Latent Phase-Shift Rollback) is an inference-time error correction method that requires no fine-tuning or additional training. It detects errors in real time by monitoring phase shifts in residual streams, rolls back KV caches, and injects guidance vectors, significantly improving the performance of large language models (LLMs) on mathematical reasoning tasks. Its core innovation lies in using changes in the model's internal representations to implement interventions. In the MATH-500 benchmark test, the 8B model outperformed the standard 70B model, demonstrating efficient parameter and computational efficiency.

2

Section 02

Background: The Dilemma of Error Accumulation in LLM Reasoning

Large language models (LLMs) face the problem of error accumulation when generating long-chain reasoning: errors in intermediate steps lead to subsequent generations deviating continuously from the correct direction, especially in multi-step tasks like mathematical reasoning. Traditional solutions such as prompt engineering have limited or even counterproductive effects, while increasing model size incurs high computational costs.

3

Section 03

Method: Core Mechanisms of LPSR

Phase Shift Detection

Monitors the model's internal state through a dual gating mechanism:

  1. Cosine similarity: Calculates direction changes in residual streams between adjacent tokens to capture sudden turns in representation vectors.
  2. Entropy analysis: Monitors changes in the uncertainty of prediction distributions. When both metrics trigger thresholds, an error is determined.

Error Correction Operations

  • KV cache rollback: Restores the state before the error step to eliminate the impact of the error.
  • Guidance vector injection: Injects precomputed guidance vectors into the residual stream to correct the generation direction. All operations are performed during inference without parameter updates.
4

Section 04

Evidence: Performance Validation on the MATH-500 Benchmark

  1. Comparison with standard autoregressive (AR):Standard AR (28.8%) → LPSR8B (44.0%), an increase of 15.2 percentage points (p<1e-15)
  2. Comparison with prompt self-correction:Prompt correction (19.8%) is lower than standard AR; LPSR has a relative increase of 24.2 percentage points (p≈0)
  3. Comparison with Best-of-N:Best-of-16 (36.2%) → LPSR (44.0%), with token cost only 1/5.4 of the former
  4. Cross-scale comparison:LPSR8B (44.0%) outperforms the standard 70B model (35.2%), with parameters reduced by 8.75 times.
5

Section 05

In-depth Finding: Decoupling Phenomenon Between Detection and Correction

A layer-by-layer scan of a 32-layer model revealed that the optimal layer for error detection (layer 14, AUC=0.718) is different from the optimal layer for correction (layer16, accuracy=44.0%). Intervening only at the optimal detection layer does not yield the best task performance, which provides guidance for the design of inference-time intervention methods.

6

Section 06

Technical Details: Key Layers and Computational Overhead

  • Key layer selection: Needs to be determined via scanning a small validation set based on the task (optimal for MATH-500 is layer16)
  • Guidance vectors: Constructed via contrastive learning based on the representation differences between correct and incorrect paths
  • Computational overhead: Mainly comes from residual stream monitoring, KV rollback, and guidance injection; negligible compared to forward propagation, maintaining efficient inference.
7

Section 07

Limitations and Future Directions

Limitations

  • Task specificity: Only validated on mathematical reasoning; effectiveness on other tasks remains to be confirmed
  • Guidance vectors: Details of the precomputation method are not fully disclosed
  • Hyperparameter sensitivity: Thresholds and key layers need task-specific tuning

Future Directions

  1. Adaptive key layer selection
  2. Cross-task transfer of guidance vectors
  3. Synergy with methods like chain-of-thought and tree search
8

Section 08

Practical Significance and Conclusion

LPSR provides a new path for LLM reasoning optimization: without retraining, it improves performance through monitoring internal states during inference, aligning with the "inference-time scaling" trend. For developers, it is a feasible solution to enhance reasoning quality. Its core idea provides a reference for building reliable AI systems and is expected to promote progress in the field of inference-time computational optimization.