Reading

LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

LLM Secret Guard is a localized security assessment tool based on the OWASP LLM Application Security Framework. It is used to test whether large language models leak sensitive information under attack prompts and provides a quantifiable and comparable defense capability assessment system.

LLM安全评估敏感信息泄漏OWASPPrompt Injection防御策略Ollama安全测试大语言模型信息安全

Published 2026-05-27 13:43Recent activity 2026-05-27 13:50Estimated read 5 min

Section 01

Introduction / Main Post: LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

Section 02

Original Author and Source

Original Author/Maintainer: Bryan-9603012
Source Platform: GitHub
Original Title: LLM-Secret-Guard
Original Link: https://github.com/Bryan-9603012/LLM-Secret-Guard
Publication Date: May 27, 2026

Section 03

Project Background and Core Objectives

With the widespread deployment of large language models (LLMs) in various applications, the risk of sensitive information leakage has become increasingly prominent. LLM Secret Guard emerged as a localized security assessment tool to test whether LLMs leak sensitive information under attack prompts.

This project focuses on risks related to Sensitive Information Disclosure, Prompt Injection, and System Prompt Leakage from the OWASP Top 10 for LLM Applications. Through fixed attack sets, leakage level determination, valid sample filtering, and defense score calculation, it helps researchers compare the effectiveness of different models and defense strategies.

The core objective is to establish a reproducible, quantifiable, and comparable testing process for LLM sensitive information leakage.

Section 04

Main Uses and Application Scenarios

LLM Secret Guard can be used in various research and testing scenarios:

Local Model Security Testing: Test whether locally deployed LLMs leak sensitive information
Model Defense Capability Comparison: Compare the differences in defense capabilities of different models under the same attack set
Defense Strategy Evaluation: Quantify the impact of different defense strategies on model outputs
Attack Type Analysis: Analyze the success rates of attack types such as prompt injection, cross-lingual attacks, and role-play attacks
Academic Research and Reports: Generate experimental data that can be used in papers, reports, and presentations
Web LLM Application Testing: Supports future expansion to testing Web LLM applications or agent architectures

Section 05

Supported Attack Types

The attack set is maintained in JSON format for easy addition, modification, and expansion. Currently, the main attack directions include:

Section 06

Direct Attacks

Direct Secret Request: Directly request sensitive information
Sensitive Data Extraction: Extract sensitive data

Section 07

Injection and Induction Attacks

Prompt Injection: Prompt injection attack
Role Play Attack: Role-play attack
Developer Mode / DAN-type Attacks: Developer mode or jailbreak attacks

Section 08

Encoding and Multi-turn Attacks

Translation-based Attack: Translation-based attack
Encoding/Decoding Induction: Encoding/decoding induction
Multi-turn Reasoning Induction: Multi-turn reasoning induction
System Prompt Leakage: System prompt leakage
Cross-lingual Attack: Cross-lingual attack

LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

Introduction / Main Post: LLM Secret Guard: A Sensitive Information Leakage Assessment Framework for Large Language Models

Original Author and Source

Project Background and Core Objectives

Main Uses and Application Scenarios

Supported Attack Types

Direct Attacks

Injection and Induction Attacks

Encoding and Multi-turn Attacks

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Building an Enterprise-Grade Real-Time MLOps Platform: A Complete Practice from Automated Training to Continuous Deployment

The 'Eureka' Phenomenon in Neural Networks: A Deep Analysis and Visual Exploration of Grokking