Zing Forum

Reading

FASE: Fast Adaptive Semantic Entropy Metric for Multi-Agent Code Generation

FASE proposes a novel semantic entropy metric that does not require LLM participation in equivalence checking. It approximates functional correctness via the minimum spanning tree of a structure-semantic difference graph, achieving a 25% performance improvement while only incurring 0.3% of the computational cost of traditional methods.

多智能体系统代码生成语义熵不确定性量化大语言模型软件工程HumanEvalBigCodeBench
Published 2026-06-09 01:53Recent activity 2026-06-09 13:52Estimated read 7 min
FASE: Fast Adaptive Semantic Entropy Metric for Multi-Agent Code Generation
1

Section 01

[Introduction] FASE: Fast Adaptive Semantic Entropy Metric for Multi-Agent Code Generation

FASE is a novel semantic entropy metric proposed to address the reliability challenges in multi-agent code generation. It solves the high cost and hallucination risk issues caused by traditional semantic entropy's reliance on LLM equivalence checking. By approximating functional correctness via the minimum spanning tree of a structure-semantic difference graph, it achieves a 25% performance improvement while only using 0.3% of the computational cost of traditional methods. This article will cover its background, methodology, experiments, applications, and other aspects.

2

Section 02

Background: Reliability Challenges in Multi-Agent Code Generation and Limitations of Traditional Semantic Entropy

Reliability Challenges in Multi-Agent Code Generation

Multi-agent code generation simulates human collaboration to complete programming tasks but faces issues like LLM hallucinations and cross-agent error propagation—errors are prone to cascading amplification and hard to identify. Traditional code quality evaluation relies on test cases, but in multi-agent scenarios, pre-existing test cases are often unavailable, requiring uncertainty quantification methods that do not need ground truth.

Limitations of Traditional Semantic Entropy

Semantic entropy quantifies uncertainty through the distribution of semantic equivalence among candidate codes, but existing methods rely on LLM equivalence checking, which is costly and introduces new hallucination risks.

3

Section 03

Core Innovation of FASE: LLM-Free Structured Semantic Difference Measurement

Core Idea of FASE

FASE approximates functional correctness via the minimum spanning tree of a structure-semantic difference graph, completely avoiding LLM equivalence checking:

  1. Structural Difference: Measure structural similarity using AST (Abstract Syntax Tree) or code embeddings
  2. Semantic Difference: Measure semantic similarity using semantic embedding models (e.g., Qwen3-Embedding-8B)
  3. Graph Construction: Candidate codes as nodes, structure-semantic differences as edge weights
  4. Minimum Spanning Tree: The distribution of tree edge weights reflects uncertainty

Technical Advantages

  • Computational cost is only 0.3% of traditional methods
  • No LLM hallucination risk
  • Scalable to large-scale multi-agent systems
  • Theoretical guarantees based on graph theory

Implementation Steps

  1. Code embedding generation
  2. Structural feature extraction
  3. Difference graph construction (weight = αstructural difference + βsemantic difference)
  4. Minimum spanning tree calculation
  5. Adaptive normalization to adjust thresholds
4

Section 04

Experimental Validation: Breakthroughs on HumanEval and BigCodeBench

Evaluation Benchmarks

  • HumanEval: 164 handwritten programming problems
  • BigCodeBench: Large-scale multi-scenario benchmark

Core Metrics

  • Spearman correlation coefficient: Measures the correlation between uncertainty and Pass@1 performance
  • ROCAUC score: Ability to distinguish between correct and incorrect code

Experimental Results

When using Qwen3-Embedding-8B:

  • Spearman correlation coefficient increased by 25%
  • ROCAUC score increased by 19%
  • Computational cost reduced by 99.7% The results prove that FASE achieves a balance between efficiency and effectiveness.
5

Section 05

Practical Application Scenarios of FASE

FASE is applicable to the following scenarios:

  1. Multi-agent code review: Quickly evaluate the reliability of agent outputs to decide whether verification or re-generation is needed
  2. Real-time code suggestion filtering: Evaluate candidate suggestion quality in milliseconds, prioritizing high-confidence options
  3. Test resource optimization: Identify high-uncertainty code and allocate test resources first
  4. Human-machine collaboration decision-making: Quantify uncertainty to support decisions on whether to involve human intervention
6

Section 06

Technical Insights and Future Directions

Technical Insights

  1. Avoid circular dependency of LLM evaluating LLM outputs
  2. Code functional correctness requires joint modeling of structure and semantics
  3. Graph theory tools (e.g., minimum spanning tree) provide a new perspective for uncertainty quantification

Future Directions

  • Explore more advanced embedding models to improve accuracy
  • Extend to other programming languages and domains
  • Develop hybrid metric methods combining execution traces
  • Apply to multi-modal code generation scenarios

Conclusion

FASE is an important advancement in the field of multi-agent code generation. It significantly reduces the cost of uncertainty quantification while maintaining accuracy, providing reliable support for the practical development of multi-agent software.