# TingIS: Enterprise-Grade Real-Time Risk Event Discovery System, Using Large Models to Extract Key Signals from Massive Noise

> The Alibaba Cloud team open-sourced the TingIS system. By combining a multi-stage event linking engine with large language models, it extracts actionable risk events from over 2000 user feedback entries per minute, achieving a 95% high-priority event discovery rate and a P90 latency of 3.5 minutes.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-04-23T17:40:45.000Z
- 最近活动: 2026-04-24T05:19:53.649Z
- 热度: 130.3
- 关键词: 智能运维, AIOps, 大语言模型, 事件发现, 实时系统, 噪音过滤, 云原生, 故障检测
- 页面链接: https://www.zingnex.cn/en/forum/thread/tingis
- Canonical: https://www.zingnex.cn/forum/thread/tingis
- Markdown 来源: floors_fallback

---

## Introduction: TingIS—Enterprise-Grade Real-Time Risk Event Discovery System

The Alibaba Cloud team open-sourced the TingIS system. By combining a multi-stage event linking engine with large language models, it extracts actionable risk events from over 2000 user feedback entries per minute, achieving a 95% high-priority event discovery rate and a P90 latency of 3.5 minutes, helping enterprises solve operation and maintenance challenges in the cloud-native era.

## Background: Cloud-Native O&M Dilemmas and the Value of User Feedback

In the cloud-native era, system complexity grows exponentially, and traditional monitoring systems have blind spots. User feedback contains semantic information that system monitoring cannot capture, but converting it into risk signals faces challenges such as high noise ratio, complex semantics, high real-time requirements, and difficulty in event aggregation.

## TingIS System Architecture: Three-Core Design Layers and Key Mechanisms

1. Multi-stage event linking engine: Efficient index recall of candidates → LLM intelligent association judgment → Incremental event maintenance; 2. Cascaded business routing mechanism: Coarse-grained classification → Fine-grained attribution → Dynamic load balancing; 3. Multi-dimensional noise reduction pipeline: Domain knowledge filtering → Statistical pattern recognition → Behavioral feature filtering → LLM semantic verification.

## Production Environment Performance: Data Validation of System Efficacy

Peak processing of over 2000 entries per minute, 300,000 entries per day on average; P90 latency of 3.5 minutes; 95% high-priority event discovery rate; Comparative tests show that routing accuracy, clustering quality, and signal-to-noise ratio are all better than baseline methods.

## Technical Highlights and Industry Insights

Technical highlights: Deep integration of engineering and algorithms, pragmatic application of LLM (using LLM in key links, traditional methods in others), interpretability and controllability; Industry insights: User feedback is an important data dimension for O&M, value of deep LLM application in vertical scenarios, layered architecture balancing real-time performance and quality.

## Limitations and Future Optimization Directions

Current limitations: Long cold start cycle, insufficient multi-language support, limited root cause localization, lack of predictive capabilities; Future directions: Shorten cold start, adapt to multi-languages, integrate root cause analysis, implement predictive alerts.