Zing Forum

Reading

BarrierBench: An Agent Framework for Verifying Dynamic System Safety Using Large Models

BarrierBench is a benchmark dataset containing 100 dynamic system test cases, paired with a large language model (LLM)-based agent framework, for automated synthesis of barrier certificates to verify system safety. The framework combines retrieval-augmented generation (RAG), SMT formal verification, and iterative optimization, achieving a success rate of over 90% on Claude Sonnet 4.

大语言模型形式化验证动态系统屏障证书智能体框架SMT求解器检索增强生成神经符号AI安全验证
Published 2026-04-14 07:14Recent activity 2026-04-14 07:18Estimated read 6 min
BarrierBench: An Agent Framework for Verifying Dynamic System Safety Using Large Models
1

Section 01

Introduction: Core Overview of the BarrierBench Agent Framework

BarrierBench is a benchmark dataset containing 100 dynamic system test cases, paired with a large language model (LLM)-based agent framework, for automated synthesis of barrier certificates to verify system safety. The framework combines retrieval-augmented generation (RAG), SMT formal verification, and iterative optimization, achieving a success rate of over 90% on Claude Sonnet 4.

2

Section 02

Background: Challenges in Dynamic System Safety Verification

In fields like autonomous driving, robot control, and industrial automation, ensuring the safety of dynamic systems is a core challenge. Traditional methods rely on experts manually designing barrier certificates, but as system complexity increases, manual design becomes difficult and error-prone. In recent years, LLMs have demonstrated strong reasoning and code generation capabilities, yet there is a lack of standardized test benchmarks to evaluate their performance in the field of formal verification.

3

Section 03

Detailed Explanation of the BarrierBench Benchmark Dataset

BarrierBench was jointly developed by Isfahan University of Technology, Max Planck Institute for Software Systems, and the University of Colorado Boulder, and has been accepted by the 8th Conference on Learning Dynamics and Control (L4DC 2026). Key contributions include: 100 test cases covering various dynamic systems, each case equipped with correct barrier function polynomials and control law expressions, and an open-source dataset (address: https://hycodev.com/data/BarrierBench.json).

4

Section 04

Analysis of the Agent Framework Architecture

BarrierBench's multi-agent collaboration framework combines LLMs with formal tools:

  1. Retrieval-Augmented Generation (RAG) Module: Finds similar solved cases from the dataset to assist LLMs in referencing historical experience;
  2. Barrier Synthesis Agent: Guides LLMs to explore barrier function forms, generates mathematical expressions, and supports iterative optimization;
  3. Barrier Verification Agent: Uses an SMT solver to formally verify candidate certificates, ensuring they meet safety constraints;
  4. Iterative Optimization Loop: Feeds back error information when verification fails, triggering a new round of candidate generation.
5

Section 05

Experimental Results and Performance Comparison

The research team compared the performance of different configurations on BarrierBench:

Configuration Claude Sonnet 4 ChatGPT-4o
Baseline (single prompt) 41% 17%
Full framework 90% 46%
Performance improvement +49% +29%
Claude Sonnet 4 achieved a success rate of over 90% under the full framework, proving the effectiveness of the architecture. This shows that reasonable task decomposition and tool integration allow LLMs to handle specialized formal verification tasks.
6

Section 06

Technical Implementation Details

The project is implemented in Python, with dependencies including: anthropic (for calling the Claude API), sympy (symbolic math computation), z3-solver (SMT solver), and numpy (numerical computation). The code structure is clear, including agent definitions, verification logic, and dataset loading modules. Developers can replace the API key to run the synthesis process.

7

Section 07

Significance and Future Outlook

BarrierBench represents an important application direction of neuro-symbolic AI, combining the pattern recognition of neural networks with the rigor of symbolic reasoning, balancing automation and verifiability. It has reference value for fields such as autonomous driving safety verification, robot control, industrial control systems, and AI safety research. As LLM capabilities improve, similar agent frameworks are expected to combine human expertise with AI computing power in more scientific and engineering fields.