Reading

Ollive: A Full-Stack LLM Application Platform Integrating Multi-Turn Dialogue and Inference Observability

Ollive is an open-source full-stack LLM chat application that not only provides streaming multi-turn dialogue functionality but also incorporates a complete inference observability infrastructure. It automatically captures metadata for each model call via an SDK, asynchronously writes it to PostgreSQL through Redis Streams event streams, and offers developers a real-time monitoring dashboard with key metrics such as latency, throughput, error rate, and token consumption.

LLMobservabilitychatbotRedis StreamsPostgreSQLTypeScriptReactDockerGeminiClaude

Published 2026-05-24 20:44Recent activity 2026-05-24 20:49Estimated read 7 min

Section 01

[Introduction] Ollive: A Full-Stack LLM Application Platform Integrating Multi-Turn Dialogue and Inference Observability

Ollive is an open-source full-stack LLM chat application whose core feature lies in the deep integration of streaming multi-turn dialogue functionality and a complete inference observability infrastructure. It automatically captures model call metadata via an SDK, asynchronously writes it to PostgreSQL through Redis Streams, and provides developers with real-time monitoring of key metrics such as latency, throughput, error rate, and token consumption. The system adopts a modular architecture, supports one-click Docker startup, and balances user experience with developers' observability needs.

Section 02

Project Background and Overview

Original Author/Maintainer: ankan17 Source Platform: GitHub Release Date: 2026-05-24 Original Link: https://github.com/ankan17/chatbot-ollive

As a full-stack AI chat application, Ollive's core design principle is to decouple product functionality (streaming dialogue) from platform observability. Users can enjoy smooth multi-turn dialogue, while developers obtain inference telemetry data through the built-in SDK to gain comprehensive insights into the application's operational status. The system includes a PostgreSQL database, Redis event streams, Express API, data ingestion process, and React frontend. Except for the model API key, it can be fully run locally via docker compose up.

Section 03

Architecture Design and Tech Stack

Ollive's architecture follows the principle of separating transactional state and telemetry data:

Chat messages: Synchronously written to PostgreSQL to ensure strong consistency;
Inference logs: Asynchronous path (SDK capture → Redis Streams → ingestion process → PostgreSQL) that does not affect the main dialogue path.

The tech stack is managed using a pnpm monorepo:

Application layer: web (Vite+React), api (Express), ingestion-worker (Redis consumer);
Shared packages: llm-sdk (multi-model adapter), db (Drizzle ORM), shared (type definitions);
Infrastructure: Docker Compose for one-click environment startup.

Section 04

Detailed Explanation of Key Design Decisions

Database Schema:
- The message table uses a sequence field to ensure order and idempotency;
- The inference log table uses a foreign key with ON DELETE SET NULL to retain audit data;
- Mixes strongly typed columns and JSONB for metadata storage, balancing performance and scalability.
Multi-provider Abstraction: Compatible with Gemini/Claude via the LLMProvider interface; adding a new model only requires implementing an adapter.
Request Cancellation: Supports streaming dialogue interruption, saves partial responses, and records cancellation events.
Guest Mode: Anonymous dialogues are stored in the browser; imported to the server after login; trial limits are enforced via Redis counters.

Section 05

Technical Trade-offs and Production Considerations

Pragmatic Choices:

Use Redis Streams instead of Kafka to reduce operational costs;
Adopt dual processes (API + ingestion) instead of microservices to avoid over-splitting;
Choose Vercel AI SDK over LangChain to adapt to current scenario needs.

Production Limitations:

Automatically runs database migrations on startup (Drizzle locks ensure safety);
Relies on upstream TLS termination; Compose provides HTTP services;
Real-time dialogue requires a valid model API key; E2E tests cover other functions.

Section 06

Future Development Directions

Project planned improvements:

Code Quality: Add manual reviews to optimize AI-generated code;
Resumable Streams: Implement stream recovery after connection interruption to enhance user experience;
Distributed Rate Limiting: Migrate to Redis-supported rate limiters to adapt to horizontal scaling;
Dashboard Expansion: Adopt time partitioning and precomputed summaries to handle log growth;
Other features: Message editing, session branching, offline caching, RAG integration, etc.

Section 07

Summary and Value

Ollive is not just a chat application but a complete LLM application observability platform. Its clear separation of product functionality and infrastructure provides a reference architecture for LLM application development. For developers, its SDK design, event-driven pipeline, and database schema all offer actionable practical experience for inference monitoring functionality.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Folkering OS: When the Operating System Itself Is AI—A Self-Evolving Bare-Metal Rust System

Folkering OS is the world's first AI-native bare-metal operating system, entirely written in Rust no_std without relying on Linux, POSIX, or libc. It can generate commands from scratch, compile them into WASM, and run them in 10 seconds, achieving true self-evolution.

Recent activity 2026-04-09 16:15