# Harpyx: Enterprise-Grade Private RAG Document Intelligence Platform

> Harpyx is a multi-tenant, self-hostable Retrieval-Augmented Generation (RAG) platform designed specifically for enterprise private document libraries. It supports virus scanning, multi-provider LLM integration, project-level conversation isolation, as well as complete audit logs and security controls.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-22T00:11:51.000Z
- 最近活动: 2026-04-22T03:59:06.518Z
- 热度: 158.2
- 关键词: RAG, 文档智能, 私有化部署, 多租户, 企业级, 向量检索, LLM, OCR, 开源
- 页面链接: https://www.zingnex.cn/en/forum/thread/harpyx-rag
- Canonical: https://www.zingnex.cn/forum/thread/harpyx-rag
- Markdown 来源: floors_fallback

---

## Introduction / Main Floor: Harpyx: Enterprise-Grade Private RAG Document Intelligence Platform

Harpyx is a multi-tenant, self-hostable Retrieval-Augmented Generation (RAG) platform designed specifically for enterprise private document libraries. It supports virus scanning, multi-provider LLM integration, project-level conversation isolation, as well as complete audit logs and security controls.

## Background and Pain Points

Current RAG (Retrieval-Augmented Generation) products on the market generally have three core issues: First, they are locked into specific large model providers, lacking flexibility; Second, they require uploading documents to third-party cloud services, making data privacy difficult to guarantee; Third, most are single-tenant toy projects that cannot adapt to the organizational structure and permission management needs of real enterprises. Harpyx was born to solve these problems.

## Project Overview

Harpyx is a multi-tenant, self-hostable document intelligence platform built on .NET 10, using a dual-service architecture (Web frontend + Worker backend) and deployed via Docker Compose. It allows organizations to upload private documents, index them into searchable vector embeddings, and conduct conversational interactions via user-selected large language models.

## Document Ingestion and Processing

Harpyx's document ingestion process is very comprehensive: uploaded files first undergo virus scanning, then are stored in object storage and queued for asynchronous parsing. The system supports decompression of multiple container formats (ZIP, RAR, 7z, tar.gz, MSG, EML), as well as extraction, chunking, and embedding of PDFs, Office documents, RTF, EPUB, HTML, plain text, images (OCR recognition), and structured files (CSV, JSON, XML, YAML).

## Multi-Provider RAG Architecture

The platform supports user-configurable API keys and is compatible with OpenAI, Anthropic Claude, and Google Gemini. All keys are encrypted using AES-256-GCM when stored at rest. Chat, embedding, and OCR models can be independently overridden at the workspace or project level, providing great flexibility.

## Project-Level Conversation Isolation

Harpyx uses a four-layer organizational architecture: platform-level tenants group users into workspaces; workspaces contain multiple projects; each project includes documents, prompts, and chat sessions based on its own documents. This design ensures strict data isolation while supporting cross-project collaboration.

## Multi-Tenancy and Role System

Platform-level roles include Admin, Operator, Reviewer, and ReadOnly; tenant members have an independent role model. User access is controlled via whitelist, and all operations are auditable.

## Self-Hosted Usage Quotas

Harpyx provides instance-level quota management, which can limit the usage of tenants, workspaces, projects, documents, storage, APIs, OCR, and RAG. Resource management can be achieved without a commercial tier.
