Reading

Quarry: Achieving Automated Theorem Proving for Rocq via Difficulty-Aware Decomposition Strategies

This article introduces the Quarry framework, which significantly enhances the automation level of interactive theorem provers by combining the high-level planning capabilities of large language models (LLMs) with the local reasoning capabilities of automated proof tools.

自动定理证明RocqCoq大语言模型形式化验证CoqHammer神经符号arXiv

Published 2026-06-16 22:33Recent activity 2026-06-17 10:33Estimated read 5 min

Quarry: Achieving Automated Theorem Proving for Rocq via Difficulty-Aware Decomposition Strategies

Section 01

Quarry Framework: Enhancing Rocq's Automated Theorem Proving via LLM Planning and Symbolic Reasoning

This article introduces the Quarry framework, which aims to address the automation bottleneck of interactive theorem provers (such as Rocq) in formal verification. By separating proof planning and execution, the framework combines the high-level planning capabilities of large language models (LLMs) with the local rigorous reasoning capabilities of automated proof tools (like CoqHammer), significantly improving Rocq's automated proof success rate. Core innovations include difficulty-aware decomposition strategies that prioritize solving easier subgoals and effectively allocate computational resources.

Section 02

Automation Dilemmas in Formal Verification and Limitations of Existing Methods

Formal verification is a key method to ensure software correctness, but constructing machine-checkable proofs still requires significant manual effort. Existing automation solutions have their own limitations: heuristic strategies (such as Coq's auto) have limited capabilities; Hammer tools (like CoqHammer) lack long-range planning; while LLM methods can propose high-level ideas, they lack local rigor. How to combine the advantages of both is an open problem in the field.

Section 03

Core Methods and Technical Implementation of the Quarry Framework

The core of Quarry is the separation of planning and execution: 1. Planning phase: LLM proposes a goal decomposition scheme (sub-lemmas + strategies); 2. Verification phase: Rocq performs type checking to verify the correctness of the decomposition, and uses a difficulty model to evaluate the Hammer solvability of subgoals; 3. Execution phase: recursively prove sub-lemmas in order of difficulty and control the computational budget. Technically, it integrates SerAPI (for interaction with Rocq), CoqHammer (automated proof engine), and a difficulty prediction model based on proof state features.

Section 04

Experimental Evaluation Results and Advantages of Quarry

In Rocq benchmark tests, Quarry increased the success rate by 7%-13% compared to the strongest baseline under a 10-minute budget; compared to pure LLM methods, its cost is more predictable; and it can adapt to open-source/commercial LLMs, with strong generality.

Section 05

Technical Contributions and Application Prospects of Quarry

Technical contributions include a new paradigm of neuro-symbolic collaboration (LLMs and symbolic systems each perform their own roles), difficulty-aware resource allocation, and progressive verification strategies. Application prospects cover critical software verification (reducing manual effort), mathematical formalization (assisting theorem transformation), and educational tools (helping students understand proofs).

Section 06

Limitations of Quarry and Future Research Directions

Limitations: decomposition quality depends on LLMs, insufficient generalization of the difficulty model, and large recursion depth for complex proofs. Future directions: stronger planning models, online learning for the difficulty model, interactive assistants, and porting to cross-provers (such as Isabelle).

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23