# Server Nexe: A Complete Solution for Localized AI Servers

> Server Nexe is a fully locally-run AI server with persistent memory, RAG retrieval, and multi-backend inference capabilities, ensuring users' conversations, documents, and model weights remain entirely on local devices.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-17T00:39:38.000Z
- 最近活动: 2026-04-17T00:50:32.294Z
- 热度: 157.8
- 关键词: 本地AI, 隐私保护, RAG, MLX, Ollama, 向量数据库, 开源项目
- 页面链接: https://www.zingnex.cn/en/forum/thread/server-nexe-ai
- Canonical: https://www.zingnex.cn/forum/thread/server-nexe-ai
- Markdown 来源: floors_fallback

---

## Introduction / Main Post: Server Nexe: A Complete Solution for Localized AI Servers

Server Nexe is a fully locally-run AI server with persistent memory, RAG retrieval, and multi-backend inference capabilities, ensuring users' conversations, documents, and model weights remain entirely on local devices.

## Project Origin and Philosophy

Server Nexe started with a simple yet profound question: "What does it take to have a local AI with persistent memory?" Since the author didn't plan to build an LLM from scratch, they began collecting various components to assemble a tool useful for their daily work.

The uniqueness of this project lies in its development approach— the entire project (code, testing, auditing, documentation) is co-completed by one person orchestrating different AI models, including local models (MLX, Ollama) and cloud models (Claude, GPT, Gemini, DeepSeek, Qwen, Grok). Humans are responsible for deciding what to build, designing the architecture, reviewing code, and running tests, while AI writes, audits, and stress-tests under human guidance.

From the initial experimental prototype, the project gradually evolved into a truly useful product: 4842 tests (about 85% coverage), security audits, static encryption, a macOS installer with hardware detection, and a plugin system.

## 1. Zero Data Leakage

This is the most prominent feature of Server Nexe. All conversations, documents, embedding vectors, and model weights remain on the user's machine. No telemetry data, no external calls, no cloud dependencies— not even a server for monitoring.

## 2. Persistent Memory System

Server Nexe uses Qdrant vector search, combined with 768-dimensional embedding vectors, to store memory in three dedicated collections. The system can:

- Automatically extract facts from conversations (names, jobs, preferences, projects)
- Store information into memory within the same LLM call with zero additional latency
- Support three-language intent detection (Catalan/Spanish/English)
- Semantic deduplication and voice deletion ("Forget that...")

## 3. Multi-Backend Inference Support

Users can freely switch between three inference backends by simply modifying the configuration file:

| Backend | Platform | Best Use Case |
|---------|----------|---------------|
| MLX | macOS (Apple Silicon) | Recommended for Mac— native Metal GPU acceleration, fastest on M-series chips |
| llama.cpp | macOS / Linux | General purpose— GGUF format, supports Metal on Mac, CPU/CUDA on Linux |
| Ollama | macOS / Linux | Bridge existing Ollama installations, simplest model management |

## 4. Intelligent Model Recommendation

The installer automatically organizes 16 catalog models into 4 tiers based on the machine's available RAM:

- **8 GB Tier**: Gemma 3 4B, Qwen3.5 4B, Qwen3 4B
- **16 GB Tier**: Gemma4 E4B, Salamandra7B, Qwen3.5 9B, Gemma3 12B
- **24 GB Tier**: Gemma4 31B, Qwen3 14B, GPT-OSS20B
- **32 GB Tier**: Qwen3.527B, Gemma327B, DeepSeek R132B, Qwen3.535B-A3B, ALIA-40B

Additionally, users can use any Ollama model by name, or any GGUF model from Hugging Face.

## 5. Modular Plugin System

Server Nexe uses an auto-discovery plugin architecture— security, Web UI, RAG, backend— everything is a plugin. Through the NexeModule protocol and duck typing (no inheritance required), users can add features without touching the core code.

## 6. RAG Document Processing

Users can upload .txt, .md, or .pdf files, and the system will automatically index them for RAG. Each document is only visible in the session it was uploaded to— no cross-contamination between sessions.