Zing Forum

Reading

Evaluation Study of Norm-Driven Workflow in Agent Code Generation

This 2026 bachelor's thesis research explores how norm-driven workflows enhance the quality and controllability of agent code generation, providing a new methodological perspective for AI-assisted programming.

智能体代码生成规范驱动开发AI辅助编程软件工程大语言模型代码质量迭代工作流自动程序设计
Published 2026-05-15 14:15Recent activity 2026-05-15 14:21Estimated read 9 min
Evaluation Study of Norm-Driven Workflow in Agent Code Generation
1

Section 01

[Introduction] Core Overview of the Evaluation Study on Norm-Driven Workflow in Agent Code Generation

This is a 2026 bachelor's thesis research that explores how norm-driven workflows enhance the quality and controllability of agent code generation, providing a new methodological perspective for AI-assisted programming. Through comparative experiments, the study analyzes the effects of different workflow modes. Key findings include: norm quality determines generation quality, iterative feedback has significant value for complex tasks, task complexity affects method selection, etc., which provides references for agent code generation practices.

2

Section 02

Background: Rise and Challenges of Agent Code Generation and Norm-Driven Concept

Rise and Challenges of Agent Code Generation

In recent years, code generation technology based on large language models has developed rapidly. However, the traditional single-generation mode has limitations such as unstable quality, difficulty in meeting constraints, and lack of interpretability. The agent paradigm treats code generation as an iterative interaction process (planning, execution, reflection), bringing new possibilities but increasing the complexity of architecture and processes.

Core Concept of Norm-Driven Workflow

The core of norm-driven workflow is to first clarify requirement norms, then use norms to constrain code generation. Norms play multiple roles:

  • Constraint Condition: Clarify functional and non-functional requirements
  • Verification Standard: Provide executable inspection basis
  • Communication Medium: Establish common understanding between humans and machines
  • Decomposition Unit: Split complex tasks into sub-norms

This method combines traditional software engineering requirement analysis with large language model generation capabilities.

3

Section 03

Research Methods and Experimental Design

Selection of Benchmark Tasks

Four types of representative tasks are selected:

  • Algorithm implementation (sorting, graph traversal, etc.)
  • API integration (third-party library calls)
  • System components (configuration parsing, data validation, etc.)
  • End-to-end applications (small complete applications)

Evaluation Index System

Multi-dimensional framework:

  • Functionality: Correctness, boundary handling, functional completeness
  • Quality: Code style, readability, maintainability
  • Efficiency: Generation success rate, number of iterations, resource consumption
  • Controllability: Norm compliance, predictability, interpretability

Comparative Experiment Setup

Four workflow modes are compared:

  1. Direct generation mode (no norm steps)
  2. Simple norm mode (brief requirement description)
  3. Structured norm mode (detailed template norms)
  4. Iterative refinement mode (multiple rounds of norm revision and feedback)
4

Section 04

Key Findings and Insights

Norm Quality Determines Generation Quality

Compared with the simple norm mode, the structured norm mode significantly improves code correctness, indicating that investment in early norm definition can improve later code quality.

Value of Iterative Feedback

Although the iterative refinement mode consumes more tokens and time, it has the highest final success rate for complex tasks, and an efficient feedback mechanism is the key.

Impact of Task Complexity

  • Simple tasks: Small differences between modes, direct generation is more efficient
  • Medium complexity: Structured norm mode has obvious advantages
  • High complexity: Iterative refinement mode has prominent value

Trade-off Between Controllability and Creativity

Norm-driven approaches enhance controllability, but overly strict norms may suppress creativity, so a balance between constraints and exploration space is needed.

5

Section 05

Practical Implications and Application Recommendations

Norm Design Principles

  1. Progressive refinement: From high-level requirements to specific constraints
  2. Testability: Include assertions and test scenarios for automatic verification
  3. Modularity: Split complex norms into sub-norms
  4. Traceability: Establish requirement-implementation mapping

Workflow Selection Strategy

  • Rapid prototyping: Direct generation mode
  • Production code: Structured norm mode
  • Complex systems: Iterative refinement mode
  • Maintenance and refactoring: Norms as a benchmark for changes

Tool Integration Recommendations

  • Norm editor: Structured templates and syntax checks
  • Version control: Incorporate norms into version management
  • Automatic verification: Convert norms into test cases
  • Visual tracking: Norm-code mapping and coverage
6

Section 06

Research Limitations and Future Directions

Research Limitations

  • Task scope: Focuses on algorithm/component level, limited coverage of large-scale systems
  • Domain limitations: Mainly general programming tasks; specific domains (embedded, security systems) need verification
  • Model dependency: Based on specific large language models; generalization needs to be tested

Future Directions

  • Automatic norm generation: Extract norms from natural language/examples
  • Norm evolution mechanism: Intelligently adjust norms and coordinate with implementation
  • Human-machine collaboration mode: Collaborative decision-making between developers and agents
  • Formal verification integration: Mathematical-level correctness guarantee
  • Multi-agent collaboration: Collaborative work of specialized agents
7

Section 07

Conclusion: Prospects and Significance of Norm-Driven Workflow

Norm-driven workflow integrates the efficiency of AI generation with the quality assurance concepts of traditional software engineering, providing a promising framework for agent code generation. As large language models and agent architectures mature, this model is expected to play an important role in future software engineering. This study contributes empirical data and insights, providing references for subsequent research and practice.