# Prometheus Project: Bridging the 'Intent Gap' in Code Repair with Executable Specifications

> A groundbreaking study proposes the Prometheus framework, which extracts Gherkin specifications from runtime failure reports via reverse engineering, achieving a 93.97% correct repair rate and successfully fixing 74.4% of complex defects. The research indicates that the future of automated program repair lies not in larger models, but in the ability to align with executable specifications.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-19T14:27:27.000Z
- 最近活动: 2026-04-21T02:49:48.920Z
- 热度: 109.6
- 关键词: 自动程序修复, APR, 智能体工作流, 行为驱动开发, BDD, Gherkin规格, 意图鸿沟, 软件工程, 代码生成, Defects4J
- 页面链接: https://www.zingnex.cn/en/forum/thread/prometheus
- Canonical: https://www.zingnex.cn/forum/thread/prometheus
- Markdown 来源: floors_fallback

---

## Introduction: Prometheus Project—Bridging the Intent Gap in Code Repair with Executable Specifications

The Prometheus project proposes an innovative framework that extracts Gherkin executable specifications from runtime failure reports through reverse engineering, addressing the 'intent gap' problem in the field of Automated Program Repair (APR). This framework achieves a 93.97% correct repair rate and successfully fixes 74.4% of complex defects. The research shows that the future of APR lies in the ability to align with executable specifications rather than using larger models.

## Background: Intent Gap and Limitations of Existing APR Methods

In Automated Program Repair (APR), AI-generated patches often have an 'intent gap' with the original intent of developers, leading to over-repairs or new bugs. Existing mitigation strategies such as natural language summaries (relying on comments/docs, which are often missing or outdated) and adversarial sampling (unable to ensure intent consistency) lack deterministic constraints. Prometheus' core insight: infer correct specifications first, rather than directly generating repair code, drawing on the concept of Behavior-Driven Development (BDD).

## Methodology: Prometheus' Three-Stage Multi-Agent Architecture

Prometheus adopts a three-stage collaborative architecture:
1. **Failure Analysis and Specification Reverse Engineering**: Infer Gherkin specifications (Given-When-Then structure) from error messages, stack traces, and failed test cases.
2. **Requirements Quality Assurance Loop (RQA Loop)**: Verify the accuracy of specifications using real code as a proxy oracle by generating candidate repairs, validating tests, and feeding back to revise specifications.
3. **Constraint-Guided Code Generation**: Generate minimal code changes with validated specifications as constraints to avoid over-engineering.

## Evidence: Groundbreaking Repair Performance and Qualitative Analysis

In the Defects4J benchmark (680 Java defects):
- **Correct Repair Rate**: 639/680 (93.97%), far exceeding the 20-40% level of existing methods.
- **Rescue Rate**: Fixed 119 complex defects that blind agents could not solve, with a rescue rate of 74.4%.
Qualitative analysis shows: Blind agents tend to over-engineer, while Prometheus repairs are precise and maintain code structure integrity.

## Conclusion: Core Directions for the Future of APR

Implications from Prometheus:
- **Specification First**: Specification inference ability is more important than code generation ability, aligning with software engineering best practices.
- **Value of Executable Specifications**: Gherkin specifications are both human-readable and executable, serving as a bridge between intent and implementation.
- **Multi-Agent Collaboration**: Different agents focus on subtasks and collaborate via structured intermediate representations to improve performance.

## Limitations and Future Research Prospects

Current Limitations:
- Only targets Java language and Defects4J-style unit test defects.
- The RQA loop depends on the quality of the test suite; incomplete tests may lead to verification errors.
Future Directions:
- Extend to other programming languages and defect types (e.g., concurrency bugs, performance issues).
- Combine static analysis to improve specification inference accuracy.
- Explore the possibility of extracting specifications from natural language requirement documents.
