Reading

Agentic Delivery Playbook: A Specification-First Workflow for AI Programming Agents

An engineering delivery model for AI programming agents that reduces agent drift and development costs while enhancing the auditability and reliability of code delivery through specification-first practices, role separation, and model routing.

AI编程代理规范优先模型路由软件工程代码审查代理漂移人机协作开发流程

Published 2026-06-04 17:15Recent activity 2026-06-04 17:19Estimated read 9 min

Agentic Delivery Playbook: A Specification-First Workflow for AI Programming Agents

Section 01

Agentic Delivery Playbook: Introduction to the Specification-First Workflow for AI Programming Agents

Agentic Delivery Playbook: A Specification-First Workflow for AI Programming Agents

Abstract: An engineering delivery model for AI programming agents that reduces agent drift and development costs while enhancing the auditability and reliability of code delivery through specification-first practices, role separation, and model routing. Keywords: AI programming agents, specification-first, model routing, software engineering, code review, agent drift, human-AI collaboration, development workflow Source: Original author/maintainer: arcayne, published on GitHub (link: https://github.com/arcayne/agentic-delivery-playbook), release date: June 4, 2026.

Core Insights: Addressing pain points in AI programming agent applications such as agent drift, cost overruns, and lack of auditability, this playbook proposes systematic solutions including specification-first delivery cycles, role separation with approval checkpoints, and model routing strategies to establish reproducible, measurable, and improvable AI-assisted development engineering practices.

Section 02

Background: Engineering Challenges of AI Programming Agents

With the rapid advancement of large language model capabilities, AI programming agents have become important auxiliary tools for software development, but they face three core challenges:

Agent Drift: Implementation processes gradually deviate from original intentions under ambiguous requirements, especially in complex multi-file modification scenarios;
Cost Overruns: Using the same strongest model for all tasks leads to high API call costs;
Lack of Auditability: No complete record of decision chains makes it difficult to trace and review issues.

The Agentic Delivery Playbook is a systematic solution targeting these pain points.

Section 03

Core Philosophy and Role Separation

Core Philosophy

Core Idea: Use the right model for the right task at the right time. The delivery cycle consists of 8 stages: intake -> spec -> critique -> approval -> implementation -> QA -> fix/escalate -> closeout, with clear responsibility boundaries and delivery standards for each stage.

Role Separation

Explicitly define 5 key roles (can be assumed by models/agents/humans):

Specification Writer: Translates ambiguous requirements into precise technical specifications (interfaces, data flows, boundary conditions, etc.), requiring strong reasoning capabilities;
Critic: Adversarially reviews specifications to identify missing boundaries and security vulnerabilities;
Implementer: Executes coding according to approved specifications, can use low-cost models;
QA Auditor: Validates implementation results against specifications;
Human Approver: Makes final decisions at key nodes (security/privacy, etc.).

Role separation enables auditability and forms a complete delivery archive.

Section 04

Model Routing Strategy

Select appropriate models based on task characteristics instead of using the strongest configuration for all tasks:

High Ambiguity Tasks (requirements analysis, architecture trade-offs, security reviews): Prioritize strong reasoning models (e.g., GPT-4, Claude 3 Opus);
Structured Implementation Tasks (coding after clear specifications): Use low-cost models (e.g., GPT-4o-mini, Claude 3 Haiku);
Quality Audit Tasks: Use models with good instruction-following ability to strictly validate against acceptance criteria.

It is recommended to record expected and actual model configurations, analyze correlations between cost, quality, and drift to optimize strategies.

Section 05

Applicable Scenarios and Tool Integration

Applicable Scenarios

Recommended use cases: Cross-boundary changes, ambiguous requirements, security/privacy risks, customer feature changes, drift-prone tasks. Full process can be skipped (must meet all conditions): Clear requirements, affects 1-2 files, simple validation, no risks, no specification-first requirements.

Tool Integration

Provides Markdown specification templates, HTML visualization templates, runtime configuration JSON templates, QA checklists;
Pi coding agent adapter supporting general model configurations;
Unified runtime directory naming: specs/YYYYMMDD-HHMM-feature-slug/, containing complete delivery archives.

Section 06

Implications for Development Teams

Technical Leads: Control AI agent risks, balance efficiency and code quality;
Development Teams: Clarify human-AI collaboration boundaries, reduce anxiety about AI dependency;
Organizational Governance: Delivery archives provide basic materials for compliance audits, proving code has undergone review and validation.

Section 07

Summary and Outlook

The Agentic Delivery Playbook represents the evolution of AI-assisted development from "prompt engineering" to "systems engineering", establishing a checkable, auditable, and optimizable delivery system.

Current version v0.2.0 covers evidence integrity, budget awareness, and ROI evaluation guidance. Future directions:

Deep integration with IDEs and CI/CD systems;
Automated model routing optimization based on historical data;
Specialized templates for specific tech stacks.

Provides a starting point for teams using AI programming agents at scale to balance efficiency and quality.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49