# Tri-Tier Private AI Architecture: Enabling Secure Integration of Local and Cloud Intelligence with Zero Public Network Exposure

> tri-tier-private-ai is a self-hosted privacy-first AI stack that uses a keyword routing mechanism to direct sensitive prompts to local models and complex reasoning tasks to the cloud, while ensuring zero public network exposure. This project provides enterprise-grade privacy protection solutions for individuals and small teams at a cost of approximately $8-12 per month.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-18T04:08:22.000Z
- 最近活动: 2026-04-18T04:23:25.038Z
- 热度: 163.8
- 关键词: 隐私保护, 本地AI, 云端路由, 关键词过滤, 零数据保留, Tailscale, Ollama, LiteLLM, 自托管, 分层架构
- 页面链接: https://www.zingnex.cn/en/forum/thread/ai-ba500924
- Canonical: https://www.zingnex.cn/forum/thread/ai-ba500924
- Markdown 来源: floors_fallback

---

## Tri-Tier Private AI Architecture: Enabling Secure Integration of Local and Cloud Intelligence with Zero Public Network Exposure

tri-tier-private-ai is a self-hosted privacy-first AI stack that uses a keyword routing mechanism to direct sensitive prompts to local models and complex reasoning tasks to the cloud, while ensuring zero public network exposure. This project provides enterprise-grade privacy protection solutions for individuals and small teams at a cost of approximately $8-12 per month, resolving the dilemma between the privacy of local models and the intelligence of cloud models.

## Background: The Dilemma Between Privacy and Intelligence

In large language model applications, users face a fundamental dilemma: local models ensure privacy but sacrifice intelligence; cloud APIs provide powerful reasoning but require entrusting sensitive data. tri-tier-private-ai proposes a tri-tier architecture intelligent routing system that allows users to balance local privacy and cloud intelligence in the same workflow, with self-hosting costs controlled at approximately $8-12 per month.

## Methodology: Tri-Tier Architecture and Core Components

Core insight of the project: Different prompts require different processing levels. Sensitive content is handled locally, while complex tasks are routed to the cloud. The architecture consists of four layers:
- **Control Layer (OpenClaw)**：Orchestrator and UI, responsible for task distribution, interacting via Tailscale private network with zero public network exposure.
- **Routing Layer (LiteLLM)**：Open-source model routing proxy, which decides the prompt processing path based on keyword rules, a zero-cost key component.
- **Private Layer (Ollama + Gemma4 E4B)**：Runs a local model with approximately 4 billion parameters (4-bit quantization occupies 3.8GB of resources), handles daily conversations and sensitive data, and data never leaves the VPS.
- **Intelligence Layer (Together AI Qwen-2.5-72B)**：72 billion parameter model with 128K context, supports Zero Data Retention (ZDR), and handles non-sensitive complex tasks.

## Methodology: Keyword Interception Logic for Sensitive Content

The core of the system's privacy protection is the keyword interception logic defined in router_hook.py. Default keywords cover categories such as finance/tax, identity/PII, documents, credentials, medical, legal, privacy markers (e.g., tax, ssn, password, medical). When a prompt is submitted, LiteLLM scans the content: if it contains sensitive keywords, it is redirected to the local Ollama; otherwise, it is sent to Together AI, achieving hard blocking of sensitive data.

## Security Measures: Multi-Layer Network Isolation and Zero Data Retention

The project adopts multi-layer security strategies:
- **Firewall**: UFW defaults to denying inbound traffic, only allowing SSH and Tailscale traffic.
- **Container Isolation**: Ollama and LiteLLM are bound to 127.0.0.1, listening only on the local loopback.
- **Tailscale Private Network**: All access is via an encrypted mesh network, with the internal IP as the only entry point.
- **Zero Data Retention**: Together AI account-level ZDR settings disable prompt storage and training; the system reinforces this protection via the X-Together-No-Store request header.

## Deployment Process and Cost Analysis

**Deployment Process**: Requires an Ubuntu 22.04 VPS (at least 4GB RAM), install Docker and Tailscale; configure the .env file (LiteLLM master key, Together AI API key); start the service and pull the Gemma4 E4B model; configure OpenClaw to use the LiteLLM endpoint and enable Together AI ZDR.
**Cost**: Hetzner CX21 VPS is approximately $10/month; Together AI charges $0.9 per million tokens for both input and output; open-source components are zero-cost. For moderate usage (500,000 tokens/month), the total cost is approximately $10-12/month.

## Testing & Validation and Extension Customization

**Testing & Validation**:
- Sensitive content test: A curl request containing "my tax file is private" should show sensitive keyword detection in the logs.
- Non-sensitive test: A request to explain the transformer mechanism should be routed to Together AI.
**Extension Customization**:
- Custom keywords: Edit PRIVATE_KEYWORDS in router_hook.py and restart LiteLLM.
- Model replacement: Ollama supports multiple local models; LiteLLM supports over 100 cloud providers.
- Custom routing logic: Modify router_hook.py to implement complex strategies (e.g., user identity, request frequency).

## Limitations and Future Improvement Directions

**Limitations**:
- Keyword routing is not perfect; it may miss detection or be bypassed, and more complex detection is needed for high-security scenarios.
- Local models have limited capabilities; complex tasks still lag behind cloud large models.
**Future Directions**: Introduce intelligent content classification models, automatic selection of multiple local models, audit logs and compliance reports, and a more user-friendly management interface.
