Zing Forum

Reading

SharedRequest: A Batch-Level Privacy-Preserving Inference Framework with 5x Cost Reduction

SharedRequest reduces query costs by 5x while protecting user prompt privacy through batch-level privacy preservation and semantic instruction grouping, without the need to modify model architectures.

隐私保护模型推理差分隐私批量处理LLM安全
Published 2026-06-03 23:23Recent activity 2026-06-04 13:20Estimated read 7 min
SharedRequest: A Batch-Level Privacy-Preserving Inference Framework with 5x Cost Reduction
1

Section 01

SharedRequest: Core Guide to the Batch-Level Privacy-Preserving Inference Framework

SharedRequest is a batch-level privacy-preserving inference framework. It reduces query costs by 5x while protecting user prompt privacy through semantic instruction grouping and batch-level privacy preservation mechanisms, without modifying model architectures. Its core idea is to shift privacy protection from the single-prompt level to the batch level, achieving a balance between privacy, utility, and efficiency. It is applicable to various LLMs (including closed-source APIs and open-source models).

2

Section 02

Problem Background: The Dilemma of Privacy-Preserving Inference

With the widespread application of public LLMs like ChatGPT, the risk of user prompt privacy leakage has become increasingly prominent. Existing privacy-preserving inference methods have many issues: differential privacy adds noise, sacrificing output utility; homomorphic encryption/secure multi-party computation has huge overhead; model-specific solutions require architecture modifications and lack generality. How to protect privacy while maintaining efficiency, generality, and not affecting output quality has become a challenge.

3

Section 03

Core Methods of SharedRequest: Batch-Level Privacy Preservation and Semantic Grouping

Core Idea

Shift privacy protection from the single-prompt level to the batch level, amortize costs through semantic equivalent instruction grouping, and obfuscate sensitive information by mixing noise variants.

Technical Mechanisms

  1. Semantic Instruction Grouping: Identify semantically similar queries and group them together, sharing instruction templates;
  2. Noise Mixing Obfuscation: Generate multiple noise variants and mix them with the original prompt to protect the real content;
  3. Batch Amortized Inference: Process the shared instruction part in batches, efficiently deliver personalized content, and linearly amortize costs.

Model Agnosticism

No need to access model parameters or modify architectures; it runs as a black-box API wrapper layer, can be seamlessly integrated into existing workflows, and is applicable to closed-source APIs, open-source hosting services, and privately deployed models.

4

Section 04

Experimental Results: Win-Win Verification of Privacy and Efficiency

Utility Improvement

Compared to traditional differential privacy baselines, output quality is improved by over 20%, and semantic coherence is close to the unprotected baseline.

Cost Reduction

Query costs are reduced by up to 5x (significant in large-batch scenarios), latency is optimized (reducing network round trips), and throughput is improved.

Privacy Strength

The noise mixing mechanism effectively defends against external eavesdroppers, supports privacy-utility trade-offs, and conforms to the differential privacy theoretical framework.

5

Section 05

Application Scenarios and Deployment Considerations

Applicable Scenarios

  1. Enterprise-level API proxies (privacy-protected access for internal employees);
  2. Privacy-sensitive industries such as healthcare, finance, and law;
  3. High-concurrency public interfaces;
  4. General solutions for multi-cloud deployment.

Deployment Recommendations

  • Tune batch size (balance latency and cost);
  • Optimize domain-specific semantic grouping strategies;
  • Calibrate noise intensity (match privacy requirements);
  • Establish privacy protection effect monitoring and auditing mechanisms.
6

Section 06

Limitations and Future Research Directions

Current Limitations

  • Batch processing may introduce latency that affects real-time applications;
  • The accuracy of semantic grouping for complex queries needs improvement;
  • Need to defend against advanced adversarial attacks targeting specific patterns.

Future Directions

  • Adaptive batch strategy (dynamically adjust size);
  • Hierarchical privacy protection (differentiated processing of sensitive content);
  • Integration with federated learning;
  • Hardware acceleration to improve batch processing efficiency.
7

Section 07

Conclusion: The Value and Significance of SharedRequest

SharedRequest represents an important advancement in privacy-preserving LLM inference, balancing privacy, utility, and efficiency through batch-level privacy protection. In today's era where data privacy is valued, its model-agnostic, efficient, and practical features have important application value for organizations that need to deploy LLMs at scale and meet privacy compliance requirements, providing a technical path worth considering.