Zing Forum

Reading

Learning to Commit: An Online Repository Memory Framework for AI to Submit Code Like Humans

Through an online repository memory mechanism, AI coding assistants learn project-specific coding styles, internal API usage patterns, and architectural constraints from historical commits, generating Pull Requests that better align with project practices.

代码生成AI编程助手Pull Request仓库记忆对比学习代码风格开源贡献
Published 2026-03-28 01:58Recent activity 2026-03-30 16:23Estimated read 7 min
Learning to Commit: An Online Repository Memory Framework for AI to Submit Code Like Humans
1

Section 01

Introduction: The Learning to Commit Framework—Enabling AI to Submit Code Like Project Members

This article introduces the Learning to Commit framework, whose core is the Online Repository Memory mechanism. By enabling AI to learn coding styles, internal API usage patterns, and architectural constraints from a project's historical commits, this framework addresses the problem of AI-generated code being "incompatible" with real-world projects and generates Pull Requests that better align with project practices.

2

Section 02

The "Incompatibility" Issue of AI Code Generation and Limitations of Snapshots

Large language models excel at code generation, but their Pull Requests are often rejected when submitted to real open-source projects. The problem is not functional correctness but a lack of "organicity"—ignoring project-specific conventions, reinventing the wheel, and violating implicit architectural constraints. Existing methods rely on code snapshots, but snapshots only show the final state, lacking historical information on "why it was designed this way" and failing to capture implicit conventions like "database operations must go through the DatabaseManager class".

3

Section 03

Core Mechanisms of the Online Repository Memory Framework

The core of the Learning to Commit framework is the Online Repository Memory, which consists of two phases: Skill Building Phase: Through supervised contrastive reflection, the AI generates solutions for historical Issues, compares them with human-submitted diffs, extracts skills such as coding styles, internal API usage, and architectural constraints, and accumulates them into a knowledge base. Conditional Generation Phase: When a new PR comes in, the AI retrieves relevant patterns from the skill library and generates code that aligns with the project's evolutionary history, ensuring organicity.

4

Section 04

Evaluation Design and Experimental Results

The evaluation uses strict time splitting: the skill library is built using commits before January 1, 2024, for the training period, and new PRs after that (ensuring they are unseen) are used for the testing period. Multi-dimensional metrics include functional correctness, code style consistency, internal API reuse rate, and rationality of modified areas. Experimental results show: increased internal API reuse rate, improved linter pass rate, enhanced architectural compliance, without sacrificing functional correctness.

5

Section 05

Reasons for the Effectiveness of Supervised Contrastive Reflection

The effectiveness of contrastive reflection comes from three points:

  1. Active Learning: The AI actively attempts to generate and compare, which leads to deeper memory than passively reading historical commits;
  2. Gap as Signal: The gap between AI-generated content and real commits reveals implicit project conventions;
  3. Continuous Accumulation: The skill library grows with the increase of historical commits, and the AI's understanding of the project gradually deepens.
6

Section 06

Application Scenarios and Current Limitations

Typical Application Scenarios: Open-source project contributions (helping new contributors), enterprise internal development (complying with internal norms), legacy project maintenance (extracting historical knowledge). Limitations: Relies on sufficient historical data (new projects benefit less), high computational cost, possible conflicts between old and new skills during project refactoring requiring versioned management.

7

Section 07

Implications for AI Programming Tools

The Learning to Commit framework provides three implications for AI programming tools:

  1. Shift from general-purpose to specialized, focusing on specific codebases;
  2. Value the knowledge in code commit history;
  3. Adopt reflective learning, allowing AI to actively try and learn from mistakes.
8

Section 08

Conclusion: Towards an Organic AI Programming Assistant

The Learning to Commit framework solves the "incompatibility" problem of AI coding agents through the online repository memory mechanism, enabling AI to not only "write correctly" but also "write like" project members. Its significance lies in revealing that true intelligence comes not only from large-scale pre-training but also from continuous learning and adaptation to specific environments. Organicity will become a key dimension for measuring the value of future AI programming tools.