Reading

Reproduction of the MIN-K% Prob Method: Detecting Membership Inference Attacks in Large Language Model Pre-training Data

Complete reproduction and extended analysis of the ICLR 2024 paper, verifying the effectiveness of MIN-K% Prob in membership inference attacks and finding that model size and text length have significant impacts on detection quality.

成员推断攻击大语言模型MIN-K% Prob数据隐私WikiMIA预训练数据检测模型安全ICLR 2024

Published 2026-04-22 05:43Recent activity 2026-04-22 05:50Estimated read 7 min

Section 01

[Introduction] Reproduction of the MIN-K% Prob Method: Detecting Membership Inference Attacks in Large Language Model Pre-training Data

This article presents a complete reproduction and extended analysis of the ICLR 2024 paper 'MIN-K% Prob Method', verifying the method's effectiveness in membership inference attacks and finding that model size and text length have significant impacts on detection quality. Focusing on the privacy and security issues of large language model pre-training data, it uses black-box analysis of model probability distributions to determine membership relationships, providing a practical tool for model auditing and privacy protection.

Section 02

Background: Research Significance of Membership Inference Attacks and Limitations of Traditional Methods

With the widespread application of large language models, Membership Inference Attacks (MIA) have become a key privacy issue: given a text and black-box access to a model, determine whether the text belongs to the pre-training data. Traditional MIA methods require reference to the model or training corpus, limiting their practical application. The MIN-K% Prob method proposed in the 2024 ICLR paper breaks this limitation by determining membership relationships solely through analyzing the model's probability distribution of "difficult tokens".

Section 03

Core Mechanism of the MIN-K% Prob Method

MIN-K% Prob is based on the observation that there are differences in probability distributions between texts seen during pre-training (members) and unseen texts (non-members). Non-member texts have "outlier tokens" with abnormally low probabilities, while member texts do not. Algorithm steps: 1. Obtain the conditional log probability of each token in the text; 2. Select the k% tokens with the lowest probabilities; 3. Calculate the average log-likelihood of these tokens as the detection score—scores closer to 0 indicate higher likelihood of being a member, while more negative scores indicate higher likelihood of being a non-member.

Section 04

Experimental Design and Reproduction Environment

The reproduction uses the free T4 GPU environment of Google Colab to complete 5 progressive experiments. The models used are from EleutherAI's Pythia family (70M to 2.8B parameters), and the evaluation benchmark is WikiMIA: member data comes from Wikipedia articles before 2017 (visible to Pythia during pre-training), non-member data comes from event articles after 2023 (unseen by the model), and time segmentation ensures fair and reliable evaluation.

Section 05

Key Experimental Findings: Effectiveness and Comparison with Baselines

Fine-tuning comparison verifies the basic effectiveness of MIN-K%: the model can distinguish between fine-tuned articles and unseen texts; 2. Compared with baseline methods (PPL, Zlib entropy), MIN-K% consistently outperforms baselines on handwritten datasets; 3. In the WikiMIA benchmark, Pythia-2.8B achieves an AUC of 0.5956 (lower than the paper's 0.67, but the trend is consistent with the paper due to small sample size and limited resources, proving reproducibility).

Section 06

Hyperparameter Tuning and Analysis of Model Size Effects

Hyperparameter scanning found: the optimal k value for smaller models is 10% (20% for large models in the paper), and k value selection is sensitive to model size; multi-model research confirms that detection quality is positively correlated with model size and text length—larger models have stronger memory capabilities, and longer texts provide more signals, enhancing detection significance.

Section 07

Advantages and Limitations of the MIN-K% Prob Method

Advantages: Black-box friendly—no need to reference the model or training data, only requires probability output to work, with high practical value. Limitations: 1. Detection accuracy still has room for improvement (AUC around 0.6); 2. The optimal k value depends on model size, increasing deployment complexity; 3. The applicability to instruction-tuned or RLHF models needs further verification.

Section 08

Privacy and Security Implications and Open Source Contributions

Privacy implications: Provides developers with an auditing tool to detect sensitive data, and also reveals the vulnerability of model memory; in terms of compliance, membership inference detection tools will become part of model compliance, requiring a balance between performance and privacy (e.g., differential privacy training, data deduplication). Open source contributions: The project is open-sourced under the MIT license, providing complete Notebook implementations, code, visualizations, and documentation to facilitate academic reproduction and community improvement, serving as a starting point for MIA researchers.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

Building an AWS Generative AI Application from Scratch: EC2 + Bedrock Hands-On Tutorial

A complete cloud-native AI application development guide for beginners, building a simple generative AI chatbot using Amazon EC2, Apache, Python CGI, and Amazon Bedrock, covering architecture design, IAM permission configuration, security best practices, and cost optimization suggestions.

Recent activity 2026-06-02 19:49