Zing Forum

Reading

costwright: Static Budget Certificate Verification for LLM Agent Workflows

costwright is a static analysis tool that can predict the worst-case budget upper limit of LLM agent workflows before code execution. It is based on the cost correctness theorem formally verified with Lean 4, supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK, and can identify budget risks without executing code.

LLM智能体预算控制静态分析形式化验证Lean 4LangGraphCrewAI成本管理CI/CD
Published 2026-06-14 03:46Recent activity 2026-06-14 03:49Estimated read 6 min
costwright: Static Budget Certificate Verification for LLM Agent Workflows
1

Section 01

[Introduction] costwright: Static Budget Certificate Verification Tool for LLM Agent Workflows

costwright is a static analysis tool for LLM agent workflows that can predict the worst-case budget upper limit and verify budget certificates before code execution. It is based on the cost correctness theorem formally verified with Lean 4, supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK, and can identify budget risks without executing code. The original author/maintainer is hernaninverso, and the project is open-sourced on GitHub (link: https://github.com/hernaninverso/costwright). Release time: 2026-06-13T19:46:11Z.

2

Section 02

Background: Budget Risks in LLM Agent Workflows

With the application of LLMs in automated workflows, the risk of budget overruns is becoming increasingly prominent. Traditional runtime trackers can only count costs after the fact and cannot provide early warnings. Research shows: 88% of loop workflows rely on framework default limits or no limits; LangGraph's default recursion limit was only 25 before v1.0.6, and 1000 after; only 12% of LLM calls set explicit token limits. Core question: How to avoid infinite loops or high costs when agents make autonomous decisions?

3

Section 03

Core Design Philosophy: Static Verification and Formal Guarantees

As a static budget certificate verification tool, costwright differs from runtime trackers by analyzing the workflow graph structure before execution to predict the worst-case budget upper limit. Its core advantage lies in the machine-verified Lean4 cost correctness theorem, which guarantees "well-typed ⟹ expenditure ≤ declared budget". It analyzes code via pure AST, has zero execution dependencies, and does not run user code.

4

Section 04

Technical Implementation and Workflow

costwright supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK. Usage steps: After pip installation, execute costwright check .. Analysis report categories: certifiable, default_dependent, non_certifiable, runaway, parse_error. The conservative design ensures no false security guarantees are provided.

5

Section 05

Practical Application Scenarios and Value

costwright can be integrated into CI/CD pipelines (e.g., GitHub Action), with automated verification controlled via --json output and --fail-on policies. The caps command identifies LLM constructs without token limits and provides parameter recommendations (e.g., max_output_tokens for OpenAI, max_tokens for Anthropic, etc.). It can also generate diff patches (--patch) but does not directly modify user files, maintaining audit traceability.

6

Section 06

Limitations and Future Directions

costwright does not handle constructs like Send fan-out, interruptions/human-machine collaboration, CrewAI hierarchical mode, dynamic goto (marked as non_certifiable). The current boundary is the worst case (over-approximation), and tighter boundary optimization is on the roadmap. The experimental fuse feature bundles budget and risk certificates but does not provide compositional guarantees; non-interference theorems are future work.

7

Section 07

Summary and Insights

costwright is an important step for LLM agent engineering towards formal methods; verifying budget constraints in advance in complex systems has practical value. It translates theory into an engineering tool, improving agent reliability. For teams, adding CI checks without modifying the architecture can significantly enhance cost controllability, making it a low-cost, high-value budget risk assessment solution.