Zing Forum

Reading

Hands-On with GitHub Agentic Workflows: Automated AWS Infrastructure Drift Detection and Attribution

Explore how to combine GitHub Agentic Workflows, Terraform, and AI agents to build an end-to-end AWS infrastructure drift detection system, enabling automatic risk classification, root cause tracing, and multi-channel notifications.

GitHub Agentic WorkflowsInfrastructure DriftTerraformAWSCloudTrailDevOpsAI AgentCI/CD
Published 2026-04-27 07:45Recent activity 2026-04-27 07:48Estimated read 8 min
Hands-On with GitHub Agentic Workflows: Automated AWS Infrastructure Drift Detection and Attribution
1

Section 01

【Overview】Hands-On with GitHub Agentic Workflows: Automated AWS Infrastructure Drift Detection and Attribution

This article introduces an open-source project based on GitHub Agentic Workflows, which combines Terraform and AI agents to build an end-to-end AWS infrastructure drift detection system. The core goal is to solve the challenges of rapid localization, risk assessment, and response to drift issues in cloud-native operations, enabling automatic risk classification, root cause tracing, and multi-channel notifications (e.g., GitHub Issues, Telegram). By combining deterministic pipelines with AI agents, it balances system stability and intelligent decision-making capabilities.

2

Section 02

Background: Why Drift Detection Needs Intelligent Upgrades

Infrastructure drift is a classic challenge in cloud-native operations. Traditional drift detection tools have three major limitations:

  1. One-size-fits-all risk levels: Unable to distinguish the severity of changes (e.g., deleting a VPC vs. modifying a tag), leading to alert fatigue or critical risks being ignored;
  2. Difficult root cause tracing: CloudTrail records API calls, but associating resource changes with operators and timestamps requires complex cross-service queries;
  3. Deteriorating ticket quality: GitHub Issues with fixed templates easily lose context, and manual maintenance costs are high. The introduction of Agentic Workflows aims to solve these links that require "judgment" rather than "calculation".
3

Section 03

Core Mechanisms of GitHub Agentic Workflows

GitHub Agentic Workflows (gh-aw) is a paradigm upgrade from traditional Actions, allowing pipelines to embed AI agents with autonomous decision-making capabilities. Agents can independently perform operations (create tickets, send notifications) based on context (e.g., Terraform Plan outputs, CloudTrail logs). The project uses the gh-aw CLI to compile Markdown workflow definitions into lock files, ensuring execution consistency and security. Key security attributes include:

  • safe-outputs: Restrict the output scope of agents (only a single Issue, single notification);
  • tools: Grant access to GitHub toolset to read build artifacts;
  • network: defaults: Restrict outbound traffic to a whitelist of secure domains.
4

Section 04

System Architecture: Four-Stage Pipeline

The system architecture consists of a four-stage pipeline:

  1. Terraform Drift Scanning: Configure AWS credentials via OIDC, execute terraform plan -detailed-exitcode to detect differences, extract resource IDs/ARNs and upload artifacts when return code is 2;
  2. CloudTrail Attribution Query: Multi-strategy queries (ARN exact query, old resource ID query, EventName fuzzy matching) to generate an attribution table containing operators, resource ARNs, and timestamps;
  3. Agent Trigger: The deterministic pipeline triggers the Agentic Workflow via gh workflow run, separating data collection and AI reasoning to ensure stability;
  4. AI Analysis and Multi-Channel Notifications: After downloading artifacts, the agent completes risk classification (critical/high/medium/low), root cause attribution, and repair guidance, automatically creating tagged GitHub Issues and sending Telegram notifications.
5

Section 05

Key Engineering Practice Points

Core principles of engineering practice:

  1. Deterministic foundation + intelligent upper layer: Verifiable logic such as data collection is retained in traditional Actions, while judgment links (risk classification, copywriting generation) are delegated to AI, balancing reliability and flexibility;
  2. Artifact-driven state transfer: Information is passed through build artifacts in each stage to ensure workflows are reentrant and debuggable;
  3. Security sandbox: Restrict agent behavior via safe-outputs and network whitelists to prevent prompt injection or excessive permission risks.
6

Section 06

Applicable Scenarios and Expansion Ideas

Applicable scenarios and expansions:

  • Multi-cloud environments: Configuration consistency monitoring for Azure Policy and GCP Organization Policy;
  • Kubernetes: Deviation detection between cluster state and GitOps repositories;
  • Security compliance: Intelligent grading of scan results and ticket distribution; For Terraform teams, the marginal cost of introducing Agentic Workflows is low, as core logic reuses existing Plan outputs, requiring only the definition of the agent's analysis prompts and output formats.
7

Section 07

Conclusion

GitHub Agentic Workflows represent the next evolution in the CI/CD field: from "executing by script" to "autonomous decision-making by goal". This article's case demonstrates the value of this paradigm—while maintaining the stability of existing workflows, it gains context-aware intelligent analysis and response capabilities. As the gh-aw platform matures, the Agentic model is expected to be implemented in more DevOps scenarios.