Zing Forum

Reading

Ollive: A Full-Stack LLM Application Platform Integrating Multi-Turn Dialogue and Inference Observability

Ollive is an open-source full-stack LLM chat application that not only provides streaming multi-turn dialogue functionality but also incorporates a complete inference observability infrastructure. It automatically captures metadata for each model call via an SDK, asynchronously writes it to PostgreSQL through Redis Streams event streams, and offers developers a real-time monitoring dashboard with key metrics such as latency, throughput, error rate, and token consumption.

LLMobservabilitychatbotRedis StreamsPostgreSQLTypeScriptReactDockerGeminiClaude
Published 2026-05-24 20:44Recent activity 2026-05-24 20:49Estimated read 7 min
Ollive: A Full-Stack LLM Application Platform Integrating Multi-Turn Dialogue and Inference Observability
1

Section 01

[Introduction] Ollive: A Full-Stack LLM Application Platform Integrating Multi-Turn Dialogue and Inference Observability

Ollive is an open-source full-stack LLM chat application whose core feature lies in the deep integration of streaming multi-turn dialogue functionality and a complete inference observability infrastructure. It automatically captures model call metadata via an SDK, asynchronously writes it to PostgreSQL through Redis Streams, and provides developers with real-time monitoring of key metrics such as latency, throughput, error rate, and token consumption. The system adopts a modular architecture, supports one-click Docker startup, and balances user experience with developers' observability needs.

2

Section 02

Project Background and Overview

Original Author/Maintainer: ankan17 Source Platform: GitHub Release Date: 2026-05-24 Original Link: https://github.com/ankan17/chatbot-ollive

As a full-stack AI chat application, Ollive's core design principle is to decouple product functionality (streaming dialogue) from platform observability. Users can enjoy smooth multi-turn dialogue, while developers obtain inference telemetry data through the built-in SDK to gain comprehensive insights into the application's operational status. The system includes a PostgreSQL database, Redis event streams, Express API, data ingestion process, and React frontend. Except for the model API key, it can be fully run locally via docker compose up.

3

Section 03

Architecture Design and Tech Stack

Ollive's architecture follows the principle of separating transactional state and telemetry data:

  • Chat messages: Synchronously written to PostgreSQL to ensure strong consistency;
  • Inference logs: Asynchronous path (SDK capture → Redis Streams → ingestion process → PostgreSQL) that does not affect the main dialogue path.

The tech stack is managed using a pnpm monorepo:

  • Application layer: web (Vite+React), api (Express), ingestion-worker (Redis consumer);
  • Shared packages: llm-sdk (multi-model adapter), db (Drizzle ORM), shared (type definitions);
  • Infrastructure: Docker Compose for one-click environment startup.
4

Section 04

Detailed Explanation of Key Design Decisions

  1. Database Schema:

    • The message table uses a sequence field to ensure order and idempotency;
    • The inference log table uses a foreign key with ON DELETE SET NULL to retain audit data;
    • Mixes strongly typed columns and JSONB for metadata storage, balancing performance and scalability.
  2. Multi-provider Abstraction: Compatible with Gemini/Claude via the LLMProvider interface; adding a new model only requires implementing an adapter.

  3. Request Cancellation: Supports streaming dialogue interruption, saves partial responses, and records cancellation events.

  4. Guest Mode: Anonymous dialogues are stored in the browser; imported to the server after login; trial limits are enforced via Redis counters.

5

Section 05

Technical Trade-offs and Production Considerations

Pragmatic Choices:

  • Use Redis Streams instead of Kafka to reduce operational costs;
  • Adopt dual processes (API + ingestion) instead of microservices to avoid over-splitting;
  • Choose Vercel AI SDK over LangChain to adapt to current scenario needs.

Production Limitations:

  • Automatically runs database migrations on startup (Drizzle locks ensure safety);
  • Relies on upstream TLS termination; Compose provides HTTP services;
  • Real-time dialogue requires a valid model API key; E2E tests cover other functions.
6

Section 06

Future Development Directions

Project planned improvements:

  1. Code Quality: Add manual reviews to optimize AI-generated code;
  2. Resumable Streams: Implement stream recovery after connection interruption to enhance user experience;
  3. Distributed Rate Limiting: Migrate to Redis-supported rate limiters to adapt to horizontal scaling;
  4. Dashboard Expansion: Adopt time partitioning and precomputed summaries to handle log growth;
  5. Other features: Message editing, session branching, offline caching, RAG integration, etc.
7

Section 07

Summary and Value

Ollive is not just a chat application but a complete LLM application observability platform. Its clear separation of product functionality and infrastructure provides a reference architecture for LLM application development. For developers, its SDK design, event-driven pipeline, and database schema all offer actionable practical experience for inference monitoring functionality.