正文

CivicBot：本地化AI语音伴侣系统的端到端实现

一个高性能双向AI语音和视觉管道，在Android端点与本地GPU加速PC之间实现实时交互，集成STT、LLM和TTS的完整本地AI伴侣解决方案。

AI语音助手本地部署语音交互Whisper大语言模型TTSAndroid边缘计算

发布时间 2026/05/11 03:44最近活动 2026/05/11 03:48预计阅读 5 分钟

章节 01

CivicBot: Localized AI Voice Companion System Overview

CivicBot is a fully localized end-to-end AI voice companion system developed by Mouhamed and Nader from Tunisia's Bizerte Higher Institute. It avoids cloud dependency, integrating STT (Faster-Whisper), LLM (via Ollama like Phi-3/Llama3.2), TTS (Kokoro-82M) in an end-edge architecture (Android + local GPU PC). Key use cases include infrastructure repair, tourism guidance, and elderly assistance, embodying tech-for-good values.

章节 02

Project Background & Design Philosophy

Most existing voice assistants rely on cloud services, posing privacy risks and network limitations. CivicBot aims to solve these issues with a local-first approach. As a civic-tech solution, it targets real social problems (infrastructure报修, tourism, elderly support) to combine cutting-edge tech with民生 needs, reflecting tech向善.

章节 03

System Architecture & Tech Stack

End-edge Collaboration: Android mobile (portability) + local GPU server (computing power) via bidirectional pipeline. Voice Pipeline:

STT: Faster-Whisper (optimized by CTranslate2 for speed/accuracy).
LLM: Ollama framework with lightweight models (Phi-3/Llama3.2 1B) for balance of quality and resource efficiency.
TTS: Kokoro-82M (small size, 24kHz high-quality voice). Visual & Mobile: CameraX (Android) for visual data, Web dashboard D-pad for navigation, 0.8s silence threshold for smooth dialogue.

章节 04

Technical Implementation Details

Android End: Jetpack Compose (UI), CameraX (image capture), WebSocket (real-time communication), 16kHz PCM audio, R8/Proguard optimization. PC Backend: Asyncio/websockets (high concurrency), ctranslate2 (GPU acceleration for Whisper), multiphase resampling (audio conversion), thread pool (non-blocking LLM execution). Network & Security: Tailscale integration (end-to-end encryption, zero-config network, private IPs).

章节 05

Hardware Requirements & Deployment

Hardware: Windows/Linux OS, NVIDIA RTX3050+ (6GB CUDA12.x), Python3.9+, Android Studio. Deployment Steps: Clone repo → create virtual env → install dependencies → start Ollama → build Android app → configure Tailscale.

章节 06

Application Scenarios & Social Value

Infrastructure Repair: Citizens submit reports via voice (lower participation barrier). Tourism: Localized service for tourists (stable even with poor network). Elderly Assistance: Help with crossing roads, emergency calls; large audio output and simple interaction for tech-unfamiliar users.

章节 07

Technical Highlights & Future Outlook

Highlights: Local-first (privacy/availability), low latency (streaming, quantization, async, hardware acceleration), modular design (easy component replacement). Future: As edge models improve and hardware costs drop, local AI will expand to smart homes, industrial inspection, education, healthcare. Decentralized AI deployment may be a key future direction.