# Amadeus-chat: 100% Locally Run LLM Command-Line Chat Tool with Hybrid RAG and Smart Memory Compression

> Amadeus-chat is a fully locally run command-line LLM chat interface that supports Hybrid RAG (BM25 + semantic search), intelligent memory compression, and convenient model management, protecting privacy without the need for internet connectivity.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-05-30T12:15:57.000Z
- 最近活动: 2026-05-30T12:19:39.080Z
- 热度: 152.9
- 关键词: LLM, 本地部署, RAG, BM25, 语义搜索, 命令行工具, 隐私保护, llama.cpp, Python
- 页面链接: https://www.zingnex.cn/en/forum/thread/amadeus-chat-100-llm-rag
- Canonical: https://www.zingnex.cn/forum/thread/amadeus-chat-100-llm-rag
- Markdown 来源: floors_fallback

---

## Amadeus-chat: 100% Local CLI LLM Tool with Hybrid RAG & Smart Memory Compression

Amadeus-chat is a fully local command-line LLM chat interface that supports Hybrid RAG (BM25 + semantic search), intelligent memory compression, and convenient model management. It runs entirely on the user's machine without networking, ensuring data privacy. Key features include local privacy protection, advanced RAG system, smart memory management, easy model handling, and a user-friendly terminal UI.

## Background & Motivation

With the popularity of LLMs, users increasingly care about data privacy and local deployment feasibility. Commercial chat tools often require networking and send data to remote servers, posing privacy risks for sensitive information. Amadeus-chat was developed as a fully local CLI solution to address this. It is a branch of the Amadeus-AI main project, focusing on non-autonomous general CLI chat interfaces for manual interaction and RAG-based document queries.

## Core Features Overview

**100% Local Privacy**: Runs via llama.cpp (llama-cpp-python) locally; no data sent to external APIs.
**Hybrid RAG System**: Supports PDF/Markdown/CSV/JSON/text docs with recursive chunking. Combines BM25 (sparse keyword match) and semantic search (dense similarity via Sentence Transformers), fused via RRF. Uses cross-encoder for result reordering.
**Smart Memory Management**: Auto memory compression—keeps recent dialogs intact while summarizing older ones to maintain context within token limits.
**Model Management**: Configurable scripts to download .gguf models from Hugging Face to Models/ directory via .env settings.
**Terminal UI**: Uses rich library for Markdown rendering, tables, progress bars for better interaction.

## Technical Architecture Details

**Vector Storage**: Custom pure NumPy pre-normalized matrix for O(1) query time normalization, avoiding complex vector DB dependencies.
**Models**: Embedding model (all-MiniLM-L6-v2, fast/lightweight); reorder model (cross-encoder/ms-marco-MiniLM-L-6-v2).
**Package Management**: Uses uv (fast Python package manager) to create venv and install dependencies like torch, llama-cpp-python, sentence-transformers.

## Usage Instructions

**Installation**: 
1. Clone repo and enter directory.
2. Run `uv sync` to install dependencies.
3. Configure .env with model info (e.g., HF_REPO_ID, HF_FILENAME).
4. Run `uv run download_model.py` to get the model.

**Launch**: `uv run chat.py --model ./Models/[model-file] --ctx 8192`

**Commands**: /help (show commands), /load (ingest docs), /docs (list indexed docs), /rag on/off (toggle RAG), /rag clear (clear indexes), /model (hot switch model), /memory (view dialog state), /clear (clear history), /bench (show performance), /save/export (save history), /quit (exit).

## Application Scenarios

Amadeus-chat is ideal for:
1. **Enterprise Document Q&A**: Index internal docs for employees to query without data leaks.
2. **Academic Research**: Import papers for interactive exploration and Q&A.
3. **Offline Use**: Work in no-network environments.
4. **Privacy-Sensitive Cases**: Handle medical records, legal docs securely.

## Summary & Outlook

Amadeus-chat demonstrates building a powerful local LLM app with hybrid RAG, smart memory, and CLI UI. It balances privacy and functionality. As local model quality and hardware improve, such tools will play a bigger role in enterprise apps and personal knowledge management.
