Zing Forum

Reading

LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

LLM Secret Guard is a localized security assessment tool based on the OWASP LLM Application Security Framework. It is used to test whether large language models leak sensitive information under attack prompts and provides a quantifiable and comparable defense capability assessment system.

LLM安全评估敏感信息泄漏OWASPPrompt Injection防御策略Ollama安全测试大语言模型信息安全
Published 2026-05-27 13:43Recent activity 2026-05-27 13:50Estimated read 5 min
LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models
1

Section 01

Introduction / Main Post: LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

LLM Secret Guard is a localized security assessment tool based on the OWASP LLM Application Security Framework. It is used to test whether large language models leak sensitive information under attack prompts and provides a quantifiable and comparable defense capability assessment system.

3

Section 03

Project Background and Core Objectives

With the widespread deployment of large language models (LLMs) in various applications, the risk of sensitive information leakage has become increasingly prominent. LLM Secret Guard emerged as a localized security assessment tool to test whether LLMs leak sensitive information under attack prompts.

This project focuses on risks related to Sensitive Information Disclosure, Prompt Injection, and System Prompt Leakage from the OWASP Top 10 for LLM Applications. Through fixed attack sets, leakage level determination, valid sample filtering, and defense score calculation, it helps researchers compare the effectiveness of different models and defense strategies.

The core objective is to establish a reproducible, quantifiable, and comparable testing process for LLM sensitive information leakage.

4

Section 04

Main Uses and Application Scenarios

LLM Secret Guard can be used in various research and testing scenarios:

  • Local Model Security Testing: Test whether locally deployed LLMs leak sensitive information
  • Model Defense Capability Comparison: Compare the differences in defense capabilities of different models under the same attack set
  • Defense Strategy Evaluation: Quantify the impact of different defense strategies on model outputs
  • Attack Type Analysis: Analyze the success rates of attack types such as prompt injection, cross-lingual attacks, and role-play attacks
  • Academic Research and Reports: Generate experimental data that can be used in papers, reports, and presentations
  • Web LLM Application Testing: Supports future expansion to testing Web LLM applications or agent architectures
5

Section 05

Supported Attack Types

The attack set is maintained in JSON format for easy addition, modification, and expansion. Currently, the main attack directions include:

6

Section 06

Direct Attacks

  • Direct Secret Request: Directly request sensitive information
  • Sensitive Data Extraction: Extract sensitive data
7

Section 07

Injection and Induction Attacks

  • Prompt Injection: Prompt injection attack
  • Role Play Attack: Role-play attack
  • Developer Mode / DAN-type Attacks: Developer mode or jailbreak attacks
8

Section 08

Encoding and Multi-turn Attacks

  • Translation-based Attack: Translation-based attack
  • Encoding/Decoding Induction: Encoding/decoding induction
  • Multi-turn Reasoning Induction: Multi-turn reasoning induction
  • System Prompt Leakage: System prompt leakage
  • Cross-lingual Attack: Cross-lingual attack