Zing Forum

Reading

Ollive Inference Chatbot: An LLM Chat System with Inference Logging

Ollive is a full-stack LLM chatbot that includes a lightweight inference logging SDK, a near-real-time ingestion API, and PostgreSQL storage. It supports multiple providers (Gemini, OpenAI, Anthropic), streaming responses, and a real-time metrics dashboard.

LLM聊天机器人推理日志监控PostgreSQL多提供商流式响应PII脱敏
Published 2026-05-23 17:12Recent activity 2026-05-23 17:23Estimated read 3 min
Ollive Inference Chatbot: An LLM Chat System with Inference Logging
1

Section 01

Introduction / Main Floor: Ollive Inference Chatbot: An LLM Chat System with Inference Logging

Ollive is a full-stack LLM chatbot that includes a lightweight inference logging SDK, a near-real-time ingestion API, and PostgreSQL storage. It supports multiple providers (Gemini, OpenAI, Anthropic), streaming responses, and a real-time metrics dashboard.

2

Section 02

Original Author and Source

  • Original Author/Maintainer: Nightstorm26
  • Source Platform: GitHub
  • Original Title: ChatBot (Ollive Inference Chatbot)
  • Original Link: https://github.com/Nightstorm26/ChatBot
  • Publication Time: May 23, 2026
3

Section 03

Project Overview

Ollive Inference Chatbot is a full-stack LLM chat application that includes three core components: a lightweight inference logging SDK, a near-real-time ingestion API, and a PostgreSQL database for storing messages and inference metadata.

This project addresses a key need in LLM applications: how to reliably record and monitor inference calls while maintaining low latency and a good developer experience.

4

Section 04

Multi-turn Conversation Support

The system maintains conversation history (latest 20 messages) and sends it to the model. This is implemented via a simple message list instead of complex token-aware context management.

5

Section 05

Multi-provider Support

  • Google Gemini (default)
  • OpenAI
  • Anthropic

Users can switch between different providers and models during a conversation.

6

Section 06

Streaming Responses

Uses SSE (Server-Sent Events) to implement token-by-token streaming responses, providing a better user experience.

7

Section 07

Inference Metrics Dashboard

Real-time 24-hour panel displays:

  • Latency statistics
  • Throughput
  • Error distribution
  • Statistics per provider
8

Section 08

PII Redaction

Sensitive information is redacted in log previews:

  • Email addresses
  • Phone numbers
  • SSN
  • Bank card numbers
  • API keys