Zing Forum

Reading

Federated Retrieval-Augmented Generation: Privacy-Preserving Large Model Inference in Trusted Execution Environments

This article introduces a secure federated RAG architecture that combines the Flower framework with Trusted Execution Environments (TEE) to enable cross-data-silo knowledge retrieval and aggregation while protecting data privacy. The study proposes a cascaded reasoning mechanism that leverages third-party models to enhance inference capabilities without compromising confidentiality, providing new ideas for privacy-preserving AI applications in sensitive fields such as healthcare and finance.

联邦学习RAG可信执行环境隐私保护大语言模型Flower框架机密计算医疗AI
Published 2026-03-26 20:23Recent activity 2026-03-28 06:51Estimated read 6 min
Federated Retrieval-Augmented Generation: Privacy-Preserving Large Model Inference in Trusted Execution Environments
1

Section 01

【Main Floor/Introduction】Federated Retrieval-Augmented Generation: An Innovative Architecture for Privacy-Preserving Large Model Inference

This article introduces a secure federated RAG architecture that combines the Flower framework with Trusted Execution Environments (TEE) to solve the problem of cross-data-silo knowledge retrieval and aggregation while protecting data privacy. Key innovations include local retrieval + confidential aggregation, cascaded reasoning mechanism, and confidential remote inference based on Flower CRC, providing new ideas for privacy-preserving AI applications in sensitive fields such as healthcare and finance.

2

Section 02

Research Background and Existing Challenges

Traditional RAG assumes documents are centrally accessible, but in reality, data is scattered across silos (unable to be aggregated due to regulations, privacy, etc.). While federated RAG addresses the dispersion issue, existing solutions have security shortcomings: retrieval results are transmitted in plaintext, which is vulnerable to server leakage.

3

Section 03

Key Innovations and Contributions

The core innovations of this study include three points: 1. Combination of local retrieval and confidential aggregation: Clients perform local retrieval and only send results to the server within TEE for aggregation; 2. Cascaded reasoning: Use non-confidential third-party models to generate preliminary answers as context to enhance the main model's inference; 3. Confidential remote inference based on Flower CRC: Run large models in hardened TEE to protect prompts and context.

4

Section 04

Technical Architecture and Workflow

Workflow: 1. The server (inside TEE) broadcasts the query; 2. Clients perform local retrieval of top-k documents and return results; 3. The server aggregates using the reciprocal rank fusion algorithm; 4. Construct enhanced context; 5. Generate answers according to the inference mode.

Three Inference Modes: Independent inference (LLM processed inside TEE), cascaded inference (combines third-party models for auxiliary answers), confidential inference (runs large-scale LLM in Flower CRC).

5

Section 05

Experimental Evaluation Results

Experimental Setup: Healthcare scenario, server model SmolLM 1.7B (CPU), cascaded third-party model AWS Nova Micro, confidential inference model Qwen3 235B (Flower CRC's H100), 4 client document libraries (PubMed, StatPearls, etc.), evaluation benchmark MIRAGE.

Key Findings: Cascaded inference improved SmolLM's performance by 40% on PubMedQA and 46% on MedQA; confidential inference achieved the best performance; independent inference had higher latency but was secure.

6

Section 06

Security Threat Model and Assumptions

Trusted Components: Hardware TEE and remote attestation, data silo clients (do not share original documents).

Untrusted Components: Honest-but-curious or compromised server operators, network attackers (cannot break standard encryption).

Scope: Assume correct TEE implementation; side-channel attacks are not considered.

7

Section 07

Application Prospects and Significance

This architecture provides new possibilities for sensitive fields: Healthcare (joint Q&A across multiple hospitals without sharing patient data), finance (cross-institution market analysis and risk assessment), enterprise knowledge management (cross-department unified retrieval system).

8

Section 08

Limitations and Future Directions

Limitations: Focused on the healthcare field; other fields need verification; third-party model selection needs optimization; system scalability needs improvement. Future directions: Expand application fields, explore optimal third-party model combinations, enhance system scalability.