Zing Forum

Reading

PRoSFI: A New Method to Improve the Reasoning Reliability of Large Language Models via Formal Intermediate Representations

PRoSFI enables 7B-parameter models to generate machine-verifiable reasoning chains through structured formal intermediate steps and a process reward mechanism, addressing the problem where traditional outcome rewards ignore intermediate errors.

大语言模型形式化验证过程奖励推理可靠性强化学习自动定理证明结构化中间表示
Published 2026-03-31 17:42Recent activity 2026-04-01 09:17Estimated read 7 min
PRoSFI: A New Method to Improve the Reasoning Reliability of Large Language Models via Formal Intermediate Representations
1

Section 01

PRoSFI: Guide to the New Method for Improving Reasoning Reliability of Large Language Models

Core Guide to PRoSFI

PRoSFI (Process Reward over Structured Formal Intermediates) is a new method to enhance the reasoning reliability of large language models. Its core lies in enabling 7B-parameter-level models to generate machine-verifiable reasoning chains through structured formal intermediate steps and a process reward mechanism, solving the problem where traditional outcome rewards ignore intermediate reasoning errors. This method balances the reliability of formal verification and the feasibility of model generation, providing a new path for building trustworthy reasoning models.

2

Section 02

Background: The Reliability Dilemma of Reasoning Models

The Reliability Dilemma of Reasoning Models

In recent years, large language models have made progress in complex multi-step reasoning tasks through outcome-reward reinforcement learning, but there is a fundamental problem: outcome rewards only focus on whether the final answer is correct, ignoring the quality of intermediate steps. This leads to models possibly receiving rewards for "guessing" the correct answer while having serious reasoning flaws. In scenarios requiring high credibility such as mathematical proof, legal analysis, and medical diagnosis, this phenomenon of "correct result but wrong process" constitutes a trust barrier.

3

Section 03

Core Challenge: Limitations of Directly Generating Formal Proofs

Core Challenge: Limitations of Directly Generating Formal Proofs

Formal proofs are logically rigorous and can be verified by automatic theorem provers, but directly generating complete formal proofs requires extremely high model capabilities. Even the most advanced models struggle to generate correct formal proofs for complex tasks, and 7B-level models are almost impossible to do so. Therefore, a pragmatic approach is needed that balances the advantages of formal verification and the reality of model capabilities.

4

Section 04

Overview of the PRoSFI Method

PRoSFI Method Core Idea

PRoSFI does not require the model to directly output complete formal proofs; instead, it generates structured intermediate steps aligned with natural language reasoning, which are then verified one by one by an external formal prover. This method reduces the task difficulty for the model (only needing to generate structured intermediate representations) while ensuring the logical correctness of each reasoning step through strict verification. Only reasoning chains that pass complete verification receive high rewards, guiding the model to learn reliable reasoning processes.

5

Section 05

Technical Implementation: Structured Intermediate Representation and Process Reward Mechanism

Technical Implementation Details

PRoSFI includes two key components:

  1. Structured Formal Intermediate Representation: When the model generates natural language reasoning, it outputs corresponding structured steps (formal skeleton, retaining precision and flexibility), where each step corresponds to a logical link, forming a complete reasoning chain.
  2. Process Reward Mechanism: The formal prover verifies each intermediate step; the model only receives high scores if all steps pass verification. If there are intermediate errors, even if the result is correct, the reward is significantly reduced. This fine-grained reward guides the model to optimize the reasoning process rather than just the result.
6

Section 06

Method Advantages: Dual Improvement of Reliability and Accuracy

Method Advantages: Balancing Reliability and Accuracy

PRoSFI solves the dilemma of traditional outcome rewards: it avoids the convergence difficulties/accuracy decline caused by strict standards, and also prevents sacrificing reliability due to loose standards. Formal verification provides an objective, quantifiable measure of reliability, unaffected by text fluency. Experiments show that PRoSFI significantly improves the reliability of the reasoning process without sacrificing the accuracy of the final answer, with each reasoning step standing up to logical inspection.

7

Section 07

Application Prospects and Significance

Application Prospects and Significance

PRoSFI provides a practical technical path for trustworthy reasoning models, suitable for high-reliability scenarios:

  • Mathematics education: Providing verified problem-solving steps for students;
  • Scientific research assistance: Offering logically rigorous analysis ideas;
  • Automatic theorem proving: Assisting experts to improve generation efficiency. In addition, PRoSFI demonstrates a new paradigm of combining formal methods with LLMs, which can be extended to fields such as program verification, logical puzzles, and legal reasoning, with great future potential.