Zing Forum

Reading

CivicBot: End-to-End Implementation of a Localized AI Voice Companion System

A high-performance bidirectional AI voice and visual pipeline enabling real-time interaction between Android endpoints and locally GPU-accelerated PCs, an integrated full local AI companion solution combining STT, LLM, and TTS.

AI语音助手本地部署语音交互Whisper大语言模型TTSAndroid边缘计算
Published 2026-05-11 03:44Recent activity 2026-05-11 03:48Estimated read 5 min
CivicBot: End-to-End Implementation of a Localized AI Voice Companion System
1

Section 01

CivicBot: Localized AI Voice Companion System Overview

CivicBot is a fully localized end-to-end AI voice companion system developed by Mouhamed and Nader from Tunisia's Bizerte Higher Institute. It avoids cloud dependency, integrating STT (Faster-Whisper), LLM (via Ollama like Phi-3/Llama3.2), TTS (Kokoro-82M) in an end-edge architecture (Android + local GPU PC). Key use cases include infrastructure repair, tourism guidance, and elderly assistance, embodying tech-for-good values.

2

Section 02

Project Background & Design Philosophy

Most existing voice assistants rely on cloud services, posing privacy risks and network limitations. CivicBot aims to solve these issues with a local-first approach. As a civic-tech solution, it targets real social problems (infrastructure repair requests, tourism, elderly support) to combine cutting-edge tech with people's livelihood needs, reflecting tech for good.

3

Section 03

System Architecture & Tech Stack

End-edge Collaboration: Android mobile (portability) + local GPU server (computing power) via bidirectional pipeline. Voice Pipeline:

  • STT: Faster-Whisper (optimized by CTranslate2 for speed/accuracy).
  • LLM: Ollama framework with lightweight models (Phi-3/Llama3.2 1B) for balance of quality and resource efficiency.
  • TTS: Kokoro-82M (small size, 24kHz high-quality voice). Visual & Mobile: CameraX (Android) for visual data, Web dashboard D-pad for navigation, 0.8s silence threshold for smooth dialogue.
4

Section 04

Technical Implementation Details

Android End: Jetpack Compose (UI), CameraX (image capture), WebSocket (real-time communication), 16kHz PCM audio, R8/Proguard optimization. PC Backend: Asyncio/websockets (high concurrency), ctranslate2 (GPU acceleration for Whisper), multiphase resampling (audio conversion), thread pool (non-blocking LLM execution). Network & Security: Tailscale integration (end-to-end encryption, zero-config network, private IPs).

5

Section 05

Hardware Requirements & Deployment

Hardware: Windows/Linux OS, NVIDIA RTX3050+ (6GB CUDA12.x), Python3.9+, Android Studio. Deployment Steps: Clone repo → create virtual env → install dependencies → start Ollama → build Android app → configure Tailscale.

6

Section 06

Application Scenarios & Social Value

Infrastructure Repair: Citizens submit reports via voice (lower participation barrier). Tourism: Localized service for tourists (stable even with poor network). Elderly Assistance: Help with crossing roads, emergency calls; large audio output and simple interaction for tech-unfamiliar users.

7

Section 07

Technical Highlights & Future Outlook

Highlights: Local-first (privacy/availability), low latency (streaming, quantization, async, hardware acceleration), modular design (easy component replacement). Future: As edge models improve and hardware costs drop, local AI will expand to smart homes, industrial inspection, education, healthcare. Decentralized AI deployment may be a key future direction.