Zing Forum

Reading

LLMLogAnalyzer: Research on Large Language Model-based Log Anomaly Detection Using Prompt Engineering

This article introduces a Java Spring Boot project that uses large language models (LLMs) and prompt engineering techniques for system log anomaly detection, comparing the effects of three prompt strategies: zero-shot, rule-driven, and template-aware.

大语言模型提示工程日志异常检测LLMPrompt EngineeringBGL数据集系统运维JavaSpring BootQwen2.5
Published 2026-06-14 01:14Recent activity 2026-06-14 01:22Estimated read 6 min
LLMLogAnalyzer: Research on Large Language Model-based Log Anomaly Detection Using Prompt Engineering
1

Section 01

[Introduction] Core Overview of the LLMLogAnalyzer Project

LLMLogAnalyzer is a master's research project based on Java Spring Boot, aiming to explore how prompt engineering can enhance the performance of large language models (LLMs) in system log anomaly detection. The project uses the BGL supercomputer log dataset and compares three prompt strategies—zero-shot, rule-driven, and template-aware—to provide references for the application of LLMs in operation and maintenance (O&M) scenarios.

2

Section 02

Project Background and Dataset Introduction

System log anomaly detection is a core challenge in O&M. Traditional methods rely on manual rules or supervised learning, requiring extensive manual feature engineering. LLMs can understand log semantics and identify anomalies based on system impacts. The project uses the BGL dataset (IBM Blue Gene/L supercomputer logs with normal/abnormal labels), and the model needs to output JSON classification labels (0 for normal, 1 for abnormal).

3

Section 03

Comparison of Three Prompt Engineering Strategies

The project compares three prompt strategies:

  1. Zero-shot prompt: No BGL-specific knowledge, classifies based on general anomaly indicators, avoiding over-reliance on keywords like ERROR;
  2. Rule-driven prompt: Structured decision process (first check anomaly/normal indicators, then use system impact fallback rules), injects domain knowledge to reduce false positives;
  3. Template-aware prompt: Provides examples of BGL log anomaly/normal patterns, injects the most domain knowledge, and has theoretically optimal performance.
4

Section 04

Technical Architecture and Model Deployment Plan

The project uses the Java Spring Boot framework, with core components including BglParser (log parsing), PromptGenerator (prompt templates), CallModelAi (LLM API calling), and EvaluationMetricsService (metric calculation). The tech stack includes Java17, MongoDB, Ollama (local LLM running), and the Qwen2.5 7B model. Advantages of local deployment: data privacy, cost control, low latency, and customizability.

5

Section 05

Multi-dimensional Evaluation Metric System

The project uses comprehensive metrics to evaluate the effectiveness of the strategies:

  • Basic classification metrics: Accuracy, precision, recall, F1 score;
  • Confusion matrix metrics: TP (true positive), TN (true negative), FP (false positive), FN (false negative);
  • Additional metrics: Invalid response rate (proportion of non-JSON outputs), average response time. These cover classification performance, output quality, and inference efficiency.
6

Section 06

Key Findings and Practical Insights of the Project

Key insights:

  1. The quality of prompt engineering may be more important than model selection;
  2. Domain knowledge can be injected incrementally from general to structured to specific;
  3. In O&M scenarios, precision (reducing false positives) is more important than recall;
  4. Requiring the model to output JSON facilitates automation, but parsing robustness needs to be considered.
7

Section 07

Application Scenarios and Future Expansion Directions

Current application scenarios: Supercomputer log monitoring, distributed system anomaly detection, security auditing. Expansion directions: Multi-dataset validation, multi-model comparison, online learning for prompt updates, multi-classification expansion, root cause analysis.

8

Section 08

Project Summary and Outlook

LLMLogAnalyzer is a rigorously designed academic project that provides a reproducible experimental framework and evaluation methods. Insights for engineers: Prompt design needs to integrate domain knowledge, evaluation requires multi-dimensional metrics, and local deployment protects data privacy. As LLM capabilities improve, combining them with prompt engineering will play a greater role in O&M automation.