# FAME: A Fine-Grained Log Anomaly Detection Framework Based on Mixture of Experts

> This article introduces the FAME framework, which achieves efficient message-level log anomaly detection and significantly reduces annotation requirements through LLM-assisted failure domain partitioning and a lightweight mixture-of-experts architecture.

- 板块: [Openclaw Llm](https://www.zingnex.cn/en/forum/board/openclaw-llm)
- 发布时间: 2026-05-21T17:34:53.000Z
- 最近活动: 2026-05-22T05:24:42.802Z
- 热度: 137.2
- 关键词: 日志异常检测, 专家混合架构, 消息级检测, 故障域划分, LLM辅助, 运维智能, 细粒度检测
- 页面链接: https://www.zingnex.cn/en/forum/thread/fame
- Canonical: https://www.zingnex.cn/forum/thread/fame
- Markdown 来源: floors_fallback

---

## FAME Framework Overview: An Innovative Solution for Fine-Grained Log Anomaly Detection

This article introduces the FAME (Failure-Aware Mixture-of-Experts) framework, an innovative solution for fine-grained log anomaly detection based on mixture of experts. Key highlights include: using LLM-assisted failure domain partitioning to achieve precise message-level detection; adopting a lightweight mixture-of-experts architecture to significantly reduce annotation requirements and computational costs, providing an efficient fault localization tool for operation and maintenance teams.

## Background: Practical Challenges in Log Anomaly Detection

Modern production systems generate massive volumes of logs, but existing methods mostly perform coarse-grained detection at the session/window level, leading to inefficient anomaly localization. Fine-grained message-level detection faces four major challenges: template polysemy (the same template can be normal or abnormal), fault heterogeneity (diverse patterns across heterogeneous subsystems), annotation bottleneck (line-by-line annotation is impractical), and computational cost (LLM real-time inference is expensive).

## Core Design and Architecture of the FAME Framework

FAME follows three core principles: efficient annotation (max K annotations per template), lightweight computation (using lightweight router + experts online), and failure awareness (partitioning templates by failure domains). The architecture consists of three phases: offline preprocessing (LLM-based failure domain partitioning + validation), lightweight annotation (selecting representative samples within domains for annotation), and online detection (router routes logs to experts to output results).

## Detailed Explanation of Key Technologies: Failure Domain Partitioning and Expert Network

1. Failure Domain Partitioning: LLM-driven semantic classification, validated (intra-domain consistency, inter-domain differentiation, complete coverage) to ensure quality; 2. Lightweight Expert Network: Router classifies logs to domain-specific experts (small neural networks), outputting anomaly results + failure domain labels; 3. Efficient Annotation: Template-level sampling (max K samples per template), selection of representative samples, and support for active learning.

## Experimental Evidence: Performance Evaluation and Comparative Analysis

Excellent performance on BGL/Thunderbird datasets: BGL (F1=98.16 when K=100, 76x reduction in annotations, 86.3% generalization to unseen EventID anomalies); Thunderbird (F1=99.95, perfect recall). Compared to traditional ML, end-to-end DL, and direct LLM inference, FAME is superior in accuracy and cost. Ablation experiments validate the necessity of failure domain partitioning and validation steps.

## Practical Deployment Considerations: Efficiency, Interpretability, and Continuous Learning

Computational Efficiency: Lightweight model processes in real-time on CPU, hundreds of times faster than LLM; Interpretability: Outputs anomaly confidence, failure domain labels, and routing basis; Continuous Learning: Supports domain assignment for new templates and incremental expert updates to adapt to system evolution.

## Research Significance and Future Outlook

Contributions: Promotes the paradigm shift of log detection from coarse-grained to fine-grained; Demonstrates the 'heavy offline, light online' application paradigm of LLM. Future directions: Multimodal log analysis, causal reasoning enhancement, adaptive thresholds, and federated learning expansion.