Zing Forum

Reading

InSeeDent: An AI Root Cause Analysis Platform Based on Multi-Agent Workflow

InSeeDent is an AI-driven intelligent fault platform for DevOps and SRE teams. It enables automated root cause analysis (RCA) and real-time diagnosis of production faults through multi-agent workflows, RAG technology, and multi-source telemetry data fusion.

AIOpsDevOpsSRE根因分析多智能体LangGraphRAG故障诊断可观测性微服务
Published 2026-05-16 16:15Recent activity 2026-05-16 16:18Estimated read 8 min
InSeeDent: An AI Root Cause Analysis Platform Based on Multi-Agent Workflow
1

Section 01

InSeeDent: Guide to the AI Root Cause Analysis Platform

InSeeDent is an open-source AI-driven intelligent fault platform for DevOps and SRE teams. Its core functions include automated root cause analysis (RCA) and real-time diagnosis of production faults through multi-agent workflows, RAG technology, and multi-source telemetry data fusion. Its design philosophy emphasizes "offline-first", supports flexible LLM integration strategies (mock, local Ollama, OpenAI API), and provides a dark-themed operation and maintenance dashboard as well as interactive chat investigation features, aiming to solve the pain points of fault troubleshooting in production environments.

2

Section 02

Pain Points and Background of Fault Diagnosis in Production Environments

In modern microservice architectures, fault troubleshooting requires manual correlation of multiple data sources such as logs, metrics, and trace data, taking hours or even days, which seriously affects MTTR (Mean Time to Repair). Traditional monitoring tools can only issue alerts but lack intelligent analysis capabilities; existing AIOps solutions are expensive, complex to deploy, and strongly dependent on cloud APIs. The industry urgently needs a lightweight, offline-capable intelligent root cause analysis solution.

3

Section 03

InSeeDent Project Overview and System Architecture

Project Overview: Designed for DevOps/SRE teams, InSeeDent collects signals from observability tools (logs, metrics, trace data, etc.), runs LangGraph multi-agent RCA workflows, displays analysis results on an operation and maintenance dashboard, and supports chat-based investigation. It prioritizes offline use by default, using a rule-based simulation engine to generate root cause results, making it suitable for demonstration and offline scenarios.

System Architecture:

  • Frontend Layer: React + Tailwind + TypeScript, including modules such as fault dashboard, telemetry ingestion, timeline, investigation chat, and service dependency graph.
  • Backend Layer: Java17 + Spring Boot3.2, providing RESTful APIs, supporting JWT authentication, JPA persistence, Flyway migrations, and compatible with H2 and PostgreSQL.
  • AI Service Layer: Python3.11 + FastAPI + LangGraph, including agents for log analysis, metrics analysis, trace analysis, deployment correlation, correlation analysis, and summarization.
4

Section 04

Detailed Explanation of Multi-Agent Workflow and RAG Features

Multi-Agent Workflow: When RCA is triggered, the process is executed as follows: log_agent → metrics_agent → trace_agent → deployment_agent → correlation_agent → summarizer_agent. Responsibilities of each agent: Log agent extracts anomalies; Metrics agent analyzes time-series data; Trace agent locates call chain faults; Deployment agent correlates faults with changes; Correlation agent constructs propagation paths; Summarizer agent generates RCA reports with confidence levels and repair suggestions.

RAG Features: Integrates Retrieval-Augmented Generation (RAG) technology, which vectorizes and stores historical fault cases and operation and maintenance manuals. When a new fault occurs, it automatically matches similar cases, making it suitable for scenarios such as periodic faults, new member training, and organizational knowledge base accumulation.

5

Section 05

Flexible LLM Integration Strategies and Deployment Methods

LLM Integration Modes:

Mode Description Network Dependency
mock (default) Deterministic multi-agent logic + heuristic rules None
ollama Local offline LLM (e.g., llama3.2, mistral) None after model download
openai OpenAI API (e.g., gpt-4o-mini) Required

Deployment Methods:

  • Local Development: Start the backend (mvn spring-boot:run), AI service (run.sh), and frontend (npm run dev), then access http://localhost:5173. The default account is admin/admin123.
  • Docker Compose: Copy .env.example to .env, then execute docker compose up --build to start the full-stack service (including PostgreSQL and optional Kafka).
6

Section 06

Practical Value and Future Outlook

Practical Value: Reduces MTTR (from hours to minutes), lowers the manual analysis burden on engineers, accumulates fault handling knowledge, supports offline scenarios, and has an extensible architecture (integrates with existing toolchains).

Summary and Outlook: InSeeDent represents a practical direction in the AIOps field, combining LLM reasoning with engineering practices, making it suitable for small and medium-sized teams and data-sensitive enterprises. In the future, it will add more enterprise-level features such as integration with additional data sources, predictive alerts, and deep LLM integration, which is worth DevOps teams' attention and trial.