Zing Forum

Reading

Math-Inference: Enabling Formal Verification of Large Language Models' Math Answers

An open-source project based on Phoenix LiveView that combines LLM routing capabilities with mathematical proof engines like SymPy, Julia, Octave, and Lean4 to achieve formal verification of AI-generated math answers, providing reliability guarantees for math AI applications.

LLM数学验证形式化证明Lean 4SymPyPhoenix LiveViewJulia自动定理证明AI安全数学推理
Published 2026-04-15 08:11Recent activity 2026-04-15 08:29Estimated read 5 min
Math-Inference: Enabling Formal Verification of Large Language Models' Math Answers
1

Section 01

Introduction: Math-Inference—Enabling Formal Verification of LLM Math Answers

Math-Inference is an open-source project based on Phoenix LiveView. By combining LLM routing capabilities with mathematical proof engines such as SymPy, Julia, Octave, and Lean4, it implements formal verification of AI-generated math answers through a two-layer generation-verification architecture, providing reliability guarantees for math AI applications and solving the hallucination problem in LLM math outputs.

2

Section 02

Background: Dilemmas and Needs of LLM Math Reasoning

LLMs have a "hallucination" problem in math reasoning, generating content that seems correct but is actually wrong. Traditional solutions (adding data, modifying architecture, fine-tuning) cannot guarantee absolute correctness, and math has zero tolerance for errors—thus the industry needs technical solutions to verify LLM math outputs.

3

Section 03

Project Overview and Core Technical Architecture

Math-Inference uses the Phoenix LiveView framework (developed in Elixir) to build real-time interactive applications. Its core architecture consists of three layers:

  1. LLM Routing Layer: Intelligently distributes problems—returns results directly for simple questions and triggers verification for complex ones;
  2. Multi-Engine Verification Layer: Complementary use of SymPy (symbolic computation), Julia (numerical computation), Octave (engineering computation), and Lean4 (formal proof);
  3. Coordination Mechanism: Hierarchical verification strategy to optimize resource utilization.
4

Section 04

Technical Implementation Highlights: Real-Time Interaction and Intelligent Loop

Highlights include:

  1. Real-Time Interaction: Phoenix LiveView pushes the verification process via WebSocket to enhance user trust;
  2. Feedback Loop: Verification errors are fed back to the LLM for re-generation, improving answer quality;
  3. Scalable Architecture: Modular design makes it easy to integrate new engines (e.g., Coq, Isabelle).
5

Section 05

Application Scenarios and Value: Multi-Domain Support

Application scenarios:

  1. Educational Assistance: Intelligent teaching assistants—verified answers are used for automatic grading;
  2. Research Verification: Quickly check intermediate steps of scientific computing;
  3. Theorem Proving: Provide proof suggestions and verification services in combination with Lean4.
6

Section 06

Limitations and Future Outlook

Limitations: High verification cost (complex proofs take time), and LLM-generated Lean code has many syntax errors. Future directions: Optimize verification strategies, improve LLM formal code generation, and explore lightweight verification methods.

7

Section 07

Conclusion: AI Direction Emphasizing Both Generation and Verification

Math-Inference represents the shift of AI math applications from generation-only to emphasizing both generation and verification, providing a reference for high-correctness AI scenarios. We look forward to more reliable intelligent systems emerging.