Reading

costwright: Static Budget Certificate Verification for LLM Agent Workflows

costwright is a static analysis tool that can predict the worst-case budget upper limit of LLM agent workflows before code execution. It is based on the cost correctness theorem formally verified with Lean 4, supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK, and can identify budget risks without executing code.

LLM智能体预算控制静态分析形式化验证Lean 4LangGraphCrewAI成本管理CI/CD

Published 2026-06-14 03:46Recent activity 2026-06-14 03:49Estimated read 6 min

costwright: Static Budget Certificate Verification for LLM Agent Workflows

Section 01

[Introduction] costwright: Static Budget Certificate Verification Tool for LLM Agent Workflows

costwright is a static analysis tool for LLM agent workflows that can predict the worst-case budget upper limit and verify budget certificates before code execution. It is based on the cost correctness theorem formally verified with Lean 4, supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK, and can identify budget risks without executing code. The original author/maintainer is hernaninverso, and the project is open-sourced on GitHub (link: https://github.com/hernaninverso/costwright). Release time: 2026-06-13T19:46:11Z.

Section 02

Background: Budget Risks in LLM Agent Workflows

With the application of LLMs in automated workflows, the risk of budget overruns is becoming increasingly prominent. Traditional runtime trackers can only count costs after the fact and cannot provide early warnings. Research shows: 88% of loop workflows rely on framework default limits or no limits; LangGraph's default recursion limit was only 25 before v1.0.6, and 1000 after; only 12% of LLM calls set explicit token limits. Core question: How to avoid infinite loops or high costs when agents make autonomous decisions?

Section 03

Core Design Philosophy: Static Verification and Formal Guarantees

As a static budget certificate verification tool, costwright differs from runtime trackers by analyzing the workflow graph structure before execution to predict the worst-case budget upper limit. Its core advantage lies in the machine-verified Lean4 cost correctness theorem, which guarantees "well-typed ⟹ expenditure ≤ declared budget". It analyzes code via pure AST, has zero execution dependencies, and does not run user code.

Section 04

Technical Implementation and Workflow

costwright supports frameworks like LangGraph, CrewAI, and OpenAI Agents SDK. Usage steps: After pip installation, execute costwright check .. Analysis report categories: certifiable, default_dependent, non_certifiable, runaway, parse_error. The conservative design ensures no false security guarantees are provided.

Section 05

Practical Application Scenarios and Value

costwright can be integrated into CI/CD pipelines (e.g., GitHub Action), with automated verification controlled via --json output and --fail-on policies. The caps command identifies LLM constructs without token limits and provides parameter recommendations (e.g., max_output_tokens for OpenAI, max_tokens for Anthropic, etc.). It can also generate diff patches (--patch) but does not directly modify user files, maintaining audit traceability.

Section 06

Limitations and Future Directions

costwright does not handle constructs like Send fan-out, interruptions/human-machine collaboration, CrewAI hierarchical mode, dynamic goto (marked as non_certifiable). The current boundary is the worst case (over-approximation), and tighter boundary optimization is on the roadmap. The experimental fuse feature bundles budget and risk certificates but does not provide compositional guarantees; non-interference theorems are future work.

Section 07

Summary and Insights

costwright is an important step for LLM agent engineering towards formal methods; verifying budget constraints in advance in complex systems has practical value. It translates theory into an engineering tool, improving agent reliability. For teams, adding CI checks without modifying the architecture can significantly enhance cost controllability, making it a low-cost, high-value budget risk assessment solution.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23