# Federated Retrieval-Augmented Generation: Privacy-Preserving Large Model Inference in Trusted Execution Environments

> This article introduces a secure federated RAG architecture that combines the Flower framework with Trusted Execution Environments (TEE) to enable cross-data-silo knowledge retrieval and aggregation while protecting data privacy. The study proposes a cascaded reasoning mechanism that leverages third-party models to enhance inference capabilities without compromising confidentiality, providing new ideas for privacy-preserving AI applications in sensitive fields such as healthcare and finance.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-03-26T12:23:53.000Z
- 最近活动: 2026-03-27T22:51:50.411Z
- 热度: 125.5
- 关键词: 联邦学习, RAG, 可信执行环境, 隐私保护, 大语言模型, Flower框架, 机密计算, 医疗AI
- 页面链接: https://www.zingnex.cn/en/forum/thread/llm-arxiv-2603-25374v1
- Canonical: https://www.zingnex.cn/forum/thread/llm-arxiv-2603-25374v1
- Markdown 来源: floors_fallback

---

## 【Main Floor/Introduction】Federated Retrieval-Augmented Generation: An Innovative Architecture for Privacy-Preserving Large Model Inference

This article introduces a secure federated RAG architecture that combines the Flower framework with Trusted Execution Environments (TEE) to solve the problem of cross-data-silo knowledge retrieval and aggregation while protecting data privacy. Key innovations include local retrieval + confidential aggregation, cascaded reasoning mechanism, and confidential remote inference based on Flower CRC, providing new ideas for privacy-preserving AI applications in sensitive fields such as healthcare and finance.

## Research Background and Existing Challenges

Traditional RAG assumes documents are centrally accessible, but in reality, data is scattered across silos (unable to be aggregated due to regulations, privacy, etc.). While federated RAG addresses the dispersion issue, existing solutions have security shortcomings: retrieval results are transmitted in plaintext, which is vulnerable to server leakage.

## Key Innovations and Contributions

The core innovations of this study include three points: 1. Combination of local retrieval and confidential aggregation: Clients perform local retrieval and only send results to the server within TEE for aggregation; 2. Cascaded reasoning: Use non-confidential third-party models to generate preliminary answers as context to enhance the main model's inference; 3. Confidential remote inference based on Flower CRC: Run large models in hardened TEE to protect prompts and context.

## Technical Architecture and Workflow

**Workflow**: 1. The server (inside TEE) broadcasts the query; 2. Clients perform local retrieval of top-k documents and return results; 3. The server aggregates using the reciprocal rank fusion algorithm; 4. Construct enhanced context; 5. Generate answers according to the inference mode.

**Three Inference Modes**: Independent inference (LLM processed inside TEE), cascaded inference (combines third-party models for auxiliary answers), confidential inference (runs large-scale LLM in Flower CRC).

## Experimental Evaluation Results

**Experimental Setup**: Healthcare scenario, server model SmolLM 1.7B (CPU), cascaded third-party model AWS Nova Micro, confidential inference model Qwen3 235B (Flower CRC's H100), 4 client document libraries (PubMed, StatPearls, etc.), evaluation benchmark MIRAGE.

**Key Findings**: Cascaded inference improved SmolLM's performance by 40% on PubMedQA and 46% on MedQA; confidential inference achieved the best performance; independent inference had higher latency but was secure.

## Security Threat Model and Assumptions

**Trusted Components**: Hardware TEE and remote attestation, data silo clients (do not share original documents).

**Untrusted Components**: Honest-but-curious or compromised server operators, network attackers (cannot break standard encryption).

**Scope**: Assume correct TEE implementation; side-channel attacks are not considered.

## Application Prospects and Significance

This architecture provides new possibilities for sensitive fields: Healthcare (joint Q&A across multiple hospitals without sharing patient data), finance (cross-institution market analysis and risk assessment), enterprise knowledge management (cross-department unified retrieval system).

## Limitations and Future Directions

Limitations: Focused on the healthcare field; other fields need verification; third-party model selection needs optimization; system scalability needs improvement. Future directions: Expand application fields, explore optimal third-party model combinations, enhance system scalability.