# FASE: Fast Adaptive Semantic Entropy Metric for Multi-Agent Code Generation

> FASE proposes a novel semantic entropy metric that does not require LLM participation in equivalence checking. It approximates functional correctness via the minimum spanning tree of a structure-semantic difference graph, achieving a 25% performance improvement while only incurring 0.3% of the computational cost of traditional methods.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-06-08T17:53:05.000Z
- 最近活动: 2026-06-09T05:52:06.896Z
- 热度: 130.0
- 关键词: 多智能体系统, 代码生成, 语义熵, 不确定性量化, 大语言模型, 软件工程, HumanEval, BigCodeBench
- 页面链接: https://www.zingnex.cn/en/forum/thread/fase
- Canonical: https://www.zingnex.cn/forum/thread/fase
- Markdown 来源: floors_fallback

---

## [Introduction] FASE: Fast Adaptive Semantic Entropy Metric for Multi-Agent Code Generation

FASE is a novel semantic entropy metric proposed to address the reliability challenges in multi-agent code generation. It solves the high cost and hallucination risk issues caused by traditional semantic entropy's reliance on LLM equivalence checking. By approximating functional correctness via the minimum spanning tree of a structure-semantic difference graph, it achieves a 25% performance improvement while only using 0.3% of the computational cost of traditional methods. This article will cover its background, methodology, experiments, applications, and other aspects.

## Background: Reliability Challenges in Multi-Agent Code Generation and Limitations of Traditional Semantic Entropy

### Reliability Challenges in Multi-Agent Code Generation
Multi-agent code generation simulates human collaboration to complete programming tasks but faces issues like LLM hallucinations and cross-agent error propagation—errors are prone to cascading amplification and hard to identify. Traditional code quality evaluation relies on test cases, but in multi-agent scenarios, pre-existing test cases are often unavailable, requiring uncertainty quantification methods that do not need ground truth.

### Limitations of Traditional Semantic Entropy
Semantic entropy quantifies uncertainty through the distribution of semantic equivalence among candidate codes, but existing methods rely on LLM equivalence checking, which is costly and introduces new hallucination risks.

## Core Innovation of FASE: LLM-Free Structured Semantic Difference Measurement

### Core Idea of FASE
FASE approximates functional correctness via the minimum spanning tree of a structure-semantic difference graph, completely avoiding LLM equivalence checking:
1. **Structural Difference**: Measure structural similarity using AST (Abstract Syntax Tree) or code embeddings
2. **Semantic Difference**: Measure semantic similarity using semantic embedding models (e.g., Qwen3-Embedding-8B)
3. **Graph Construction**: Candidate codes as nodes, structure-semantic differences as edge weights
4. **Minimum Spanning Tree**: The distribution of tree edge weights reflects uncertainty

### Technical Advantages
- Computational cost is only 0.3% of traditional methods
- No LLM hallucination risk
- Scalable to large-scale multi-agent systems
- Theoretical guarantees based on graph theory

### Implementation Steps
1. Code embedding generation
2. Structural feature extraction
3. Difference graph construction (weight = α*structural difference + β*semantic difference)
4. Minimum spanning tree calculation
5. Adaptive normalization to adjust thresholds

## Experimental Validation: Breakthroughs on HumanEval and BigCodeBench

### Evaluation Benchmarks
- HumanEval: 164 handwritten programming problems
- BigCodeBench: Large-scale multi-scenario benchmark

### Core Metrics
- Spearman correlation coefficient: Measures the correlation between uncertainty and Pass@1 performance
- ROCAUC score: Ability to distinguish between correct and incorrect code

### Experimental Results
When using Qwen3-Embedding-8B:
- Spearman correlation coefficient increased by 25%
- ROCAUC score increased by 19%
- Computational cost reduced by 99.7%
The results prove that FASE achieves a balance between efficiency and effectiveness.

## Practical Application Scenarios of FASE

FASE is applicable to the following scenarios:
1. **Multi-agent code review**: Quickly evaluate the reliability of agent outputs to decide whether verification or re-generation is needed
2. **Real-time code suggestion filtering**: Evaluate candidate suggestion quality in milliseconds, prioritizing high-confidence options
3. **Test resource optimization**: Identify high-uncertainty code and allocate test resources first
4. **Human-machine collaboration decision-making**: Quantify uncertainty to support decisions on whether to involve human intervention

## Technical Insights and Future Directions

### Technical Insights
1. Avoid circular dependency of LLM evaluating LLM outputs
2. Code functional correctness requires joint modeling of structure and semantics
3. Graph theory tools (e.g., minimum spanning tree) provide a new perspective for uncertainty quantification

### Future Directions
- Explore more advanced embedding models to improve accuracy
- Extend to other programming languages and domains
- Develop hybrid metric methods combining execution traces
- Apply to multi-modal code generation scenarios

### Conclusion
FASE is an important advancement in the field of multi-agent code generation. It significantly reduces the cost of uncertainty quantification while maintaining accuracy, providing reliable support for the practical development of multi-agent software.
