Zing Forum

Reading

Amadeus-chat: 100% Locally Run LLM Command-Line Chat Tool with Hybrid RAG and Smart Memory Compression

Amadeus-chat is a fully locally run command-line LLM chat interface that supports Hybrid RAG (BM25 + semantic search), intelligent memory compression, and convenient model management, protecting privacy without the need for internet connectivity.

LLM本地部署RAGBM25语义搜索命令行工具隐私保护llama.cppPython
Published 2026-05-30 20:15Recent activity 2026-05-30 20:19Estimated read 6 min
Amadeus-chat: 100% Locally Run LLM Command-Line Chat Tool with Hybrid RAG and Smart Memory Compression
1

Section 01

Amadeus-chat: 100% Local CLI LLM Tool with Hybrid RAG & Smart Memory Compression

Amadeus-chat is a fully local command-line LLM chat interface that supports Hybrid RAG (BM25 + semantic search), intelligent memory compression, and convenient model management. It runs entirely on the user's machine without networking, ensuring data privacy. Key features include local privacy protection, advanced RAG system, smart memory management, easy model handling, and a user-friendly terminal UI.

2

Section 02

Background & Motivation

With the popularity of LLMs, users increasingly care about data privacy and local deployment feasibility. Commercial chat tools often require networking and send data to remote servers, posing privacy risks for sensitive information. Amadeus-chat was developed as a fully local CLI solution to address this. It is a branch of the Amadeus-AI main project, focusing on non-autonomous general CLI chat interfaces for manual interaction and RAG-based document queries.

3

Section 03

Core Features Overview

100% Local Privacy: Runs via llama.cpp (llama-cpp-python) locally; no data sent to external APIs. Hybrid RAG System: Supports PDF/Markdown/CSV/JSON/text docs with recursive chunking. Combines BM25 (sparse keyword match) and semantic search (dense similarity via Sentence Transformers), fused via RRF. Uses cross-encoder for result reordering. Smart Memory Management: Auto memory compression—keeps recent dialogs intact while summarizing older ones to maintain context within token limits. Model Management: Configurable scripts to download .gguf models from Hugging Face to Models/ directory via .env settings. Terminal UI: Uses rich library for Markdown rendering, tables, progress bars for better interaction.

4

Section 04

Technical Architecture Details

Vector Storage: Custom pure NumPy pre-normalized matrix for O(1) query time normalization, avoiding complex vector DB dependencies. Models: Embedding model (all-MiniLM-L6-v2, fast/lightweight); reorder model (cross-encoder/ms-marco-MiniLM-L-6-v2). Package Management: Uses uv (fast Python package manager) to create venv and install dependencies like torch, llama-cpp-python, sentence-transformers.

5

Section 05

Usage Instructions

Installation:

  1. Clone repo and enter directory.
  2. Run uv sync to install dependencies.
  3. Configure .env with model info (e.g., HF_REPO_ID, HF_FILENAME).
  4. Run uv run download_model.py to get the model.

Launch: uv run chat.py --model ./Models/[model-file] --ctx 8192

Commands: /help (show commands), /load (ingest docs), /docs (list indexed docs), /rag on/off (toggle RAG), /rag clear (clear indexes), /model (hot switch model), /memory (view dialog state), /clear (clear history), /bench (show performance), /save/export (save history), /quit (exit).

6

Section 06

Application Scenarios

Amadeus-chat is ideal for:

  1. Enterprise Document Q&A: Index internal docs for employees to query without data leaks.
  2. Academic Research: Import papers for interactive exploration and Q&A.
  3. Offline Use: Work in no-network environments.
  4. Privacy-Sensitive Cases: Handle medical records, legal docs securely.
7

Section 07

Summary & Outlook

Amadeus-chat demonstrates building a powerful local LLM app with hybrid RAG, smart memory, and CLI UI. It balances privacy and functionality. As local model quality and hardware improve, such tools will play a bigger role in enterprise apps and personal knowledge management.