Zing Forum

Reading

OCELOT: Designing an Inference Leakage Budget Mechanism for Privacy-Preserving LLM Agents

This article introduces OCELOT, a runtime mediation mechanism that sets an upper limit on privacy leakage for LLM agents using the "Witness-Verified Decryption" technique. It effectively controls the risks of cumulative, bidirectional, and task-dependent inference leakage while ensuring task utility.

LLM智能体隐私保护推理泄露差分隐私见证验证累积泄露越狱攻击后验风险控制AI安全
Published 2026-06-11 01:13Recent activity 2026-06-11 11:18Estimated read 5 min
OCELOT: Designing an Inference Leakage Budget Mechanism for Privacy-Preserving LLM Agents
1

Section 01

OCELOT: A New Paradigm for LLM Agent Privacy Protection—The Inference Leakage Budget Mechanism

OCELOT is a runtime mediation mechanism designed for privacy-preserving LLM agents. It sets an upper limit on inference leakage budgets using the "Witness-Verified Decryption" technique, aiming to address the cumulative, bidirectional, and task-dependent privacy leakage risks faced by LLM agents while ensuring task utility and effectively controlling privacy risks. This article was originally published on arXiv (released on June 10, 2026), with the original title 《OCELOT: Inference-Leakage Budgets for Privacy-Preserving LLM Agents》, link: http://arxiv.org/abs/2606.12341v1.

2

Section 02

Background: Privacy Dilemmas of LLM Agents and Three Core Challenges

LLM agents are evolving into complex task execution assistants, but continuous multi-step interactions bring unprecedented privacy risks. Traditional methods struggle to address three core challenges: 1. Cumulative leakage: Harmless information fragments from single interactions can accumulate to infer a complete privacy profile; 2. Bidirectional leakage: External malicious inputs (e.g., jailbreak attacks) induce agents to leak information; 3. Task dependency: The same information varies greatly in sensitivity to different recipients, making uniform filtering strategies ineffective.

3

Section 03

OCELOT's Core Concepts and Architecture Design

OCELOT proposes a "post-hoc risk control" paradigm, setting leakage budgets for interaction trajectories and quantifying the information gain an attacker obtains through an agent's behavior. The core technology is the "Witness-Verified Decryption" mechanism: 1. Untrusted defense model: A locally fine-tuned model that outputs structured evidence (atomic information tags, decryption operation proposals); 2. Deterministic verifier: Audits evidence, calculates the minimum entropy cost, checks if it is within the budget (based on the recipient's trust weight), and the verifier is deterministic and auditable.

4

Section 04

Budget Management and OCELOT's Technical Advantages

In terms of budget management, all decryption decisions are recorded in a tamper-proof ledger, and budget allocation considers the recipient's trust level. Experimental results show OCELOT's advantages: 1. Lower leakage and higher utility: Precise budget allocation achieves a win-win between privacy and task quality; 2. Resistance to multiple attacks: Adaptive injection, jailbreak, cumulative inference, and recipient collusion attacks; 3. Moderate performance overhead: The verification process has controllable computational complexity, facilitating practical deployment.

5

Section 05

Industry Significance and Future Outlook of OCELOT

OCELOT marks a shift in LLM agent privacy protection from "post-hoc remediation" to "pre-emptive prevention", providing a quantifiable and auditable framework to help enterprises meet regulatory requirements such as GDPR. In the future, similar budget control mechanisms may become standard configurations for AI systems in sensitive scenarios (healthcare, finance, law), and open-source code and evaluation frameworks will promote community exploration.