Zing Forum

Reading

Symbolic Equivalence Partitioning: A New Code Selection Method Without Extra LLM Calls

Symbolic Equivalence Partitioning groups candidate programs by their semantic behavior via symbolic execution, significantly improving code generation accuracy without increasing LLM inference costs.

代码生成符号执行Best-of-NLLM程序分析SMT
Published 2026-04-08 05:37Recent activity 2026-04-09 09:52Estimated read 7 min
Symbolic Equivalence Partitioning: A New Code Selection Method Without Extra LLM Calls
1

Section 01

Main Floor: Symbolic Equivalence Partitioning—A New Code Selection Method Without Extra LLM Calls

In the field of code generation, Best-of-N sampling is a commonly used technique, but reliably selecting the correct candidate has always been a challenge. Symbolic Equivalence Partitioning groups candidate programs by their semantic behavior via symbolic execution, selects a representative from the largest equivalence group, and significantly improves code generation accuracy without increasing LLM inference costs, providing a new solution to this problem.

2

Section 02

Background: Limitations of Existing Best-of-N Selection Methods

Traditional Best-of-N selection relies on external validators and falls into two categories:

  1. Test case execution: Simple and intuitive, but suffers from incomplete test coverage, passing tests does not mean correctness for all inputs, and designing comprehensive tests is difficult;
  2. Random or heuristic validation: Results are random and lack reliability. Common issues: Require extra computing resources or multiple executions, increasing inference costs.
3

Section 03

Core Idea: Innovative Approach to Semantic Behavior Grouping

Key insight of Symbolic Equivalence Partitioning: Functionally equivalent programs have consistent semantic behavior. Instead of verifying candidates one by one, we first group them by semantics and select a representative from the largest equivalence group. This method uses symbolic execution to analyze program behavior without actual execution or extra LLM calls.

4

Section 04

Technical Implementation: Workflow of Symbolic Execution + SMT Assumptions

Work Steps

  1. Symbolic Execution: Use symbolic value inputs, track constraints, and extract semantic features;
  2. Semantic Equivalence Grouping: Group programs with the same output or control flow under all inputs;
  3. Representative Selection: Select the representative from the largest equivalence group as the output (assuming that programs with consistent semantics are more likely to be correct).

Role of SMT Assumptions

Encode domain-specific constraints (input types, preconditions, etc.) to reduce path explosion, prevent invalid input searches, and improve analysis accuracy.

5

Section 05

Experimental Evidence: Significant Accuracy Improvement with Zero Extra LLM Cost

Validated on mainstream benchmarks:

  • HumanEval+: Pass@1 increased from 0.728 to 0.803 (+7.5 percentage points);
  • LiveCodeBench: Pass@1 increased from 0.516 to 0.604 (+8.8 percentage points); Key advantage: All analysis and selection processes are completed via symbolic execution, with no extra LLM inference calls.
6

Section 06

Comparison and Application Scenarios: Advantages and Limitations of the Method

Comparison with Traditional Methods

Method Extra LLM Calls Validation Reliability Computational Overhead
Test Case Execution None Medium (depends on test coverage) Low
LLM Reordering High (multiple calls) Medium-High High
Symbolic Equivalence Partitioning None High (semantic-level validation) Medium

Application Scenarios

  • Code generation requiring high semantic correctness;
  • Limited LLM inference budget;
  • Problem domains with clear constraints that can be encoded.

Limitations

  • Symbolic execution has limited analysis of complex program structures (dynamic memory, complex loops);
  • Poor grouping effect for highly non-deterministic programs;
  • Higher implementation complexity than simple test execution.
7

Section 07

Domain Significance and Future Outlook

Significance for the Code Generation Domain

  1. Decoupling validation and generation: Achieve high-quality validation without increasing LLM costs;
  2. Revival of program analysis techniques: Collaboration between traditional techniques (symbolic execution, SMT) and LLMs;
  3. Balance between efficiency and quality: Improve quality while controlling inference costs.

Future Outlook

  • Combination with reordering methods: Coarse screening + fine selection;
  • Expansion to more programming languages (currently mainly supports Python);
  • Development of incremental symbolic execution techniques to handle large-scale programs.