Zing Forum

Reading

SocketAI Reproduction: Detecting Malicious npm Packages Using LLM

Open-source reproduction of the ICSE 2025 paper SocketAI, implementing an npm package malicious code detection tool based on a three-stage LLM analysis workflow, supporting CodeQL pre-screening and full experimental data export.

npmsecurityLLMmalware detectionCodeQLstatic analysissupply chain security
Published 2026-04-08 16:14Recent activity 2026-04-08 16:21Estimated read 6 min
SocketAI Reproduction: Detecting Malicious npm Packages Using LLM
1

Section 01

SocketAI Reproduction: Guide to the LLM-Powered Malicious npm Package Detection Tool

This article introduces the open-source reproduction project of the ICSE 2025 paper SocketAI. This tool implements malicious code detection for npm packages based on a three-stage LLM analysis workflow, supporting CodeQL pre-screening and full experimental data export. It aims to address the problem that traditional static analysis in the npm ecosystem struggles to handle new types of malicious attacks.

2

Section 02

Research Background and Motivation

As the world's largest software package repository (with over 2 million packages), npm brings convenience but also carries risks of malicious code injection (e.g., install scripts executing malicious commands, dependency tampering, obfuscation and hiding, etc.). Traditional detection methods (signature-based requiring frequent rule updates, behavior-based with high false positive rates) have limitations. The semantic understanding capability of LLM can make up for the shortcomings of traditional tools, distinguishing between code with similar syntax but different intentions (e.g., cleaning temporary files vs deleting system files).

3

Section 03

SocketAI's Core Methodology

SocketAI adopts a three-stage progressive analysis strategy: 1. Initial Malicious Assessment: LLM quickly screens files to evaluate the potential malicious level (considering obfuscation, network requests, sensitive paths, etc.); 2. Self-Review and Correction: The model reflects on initial judgments to reduce misjudgments caused by insufficient context or superficial similarity; 3. Final File-Level Determination: Integrates information to provide a clear malicious score and reasoning process for manual review.

4

Section 04

Highlights of Engineering Implementation

The reproduced version balances academic rigor and engineering practicality: 1. Optional CodeQL Pre-screening: Uses CodeQL to quickly identify risk patterns, reducing the amount of LLM analysis; 2. Observability: Exports data from each analysis step (prompt, response, token consumption, time cost, etc.); 3. Flexible Input: Supports local directories and tgz/tar/zip archives; 4. Batch Processing: Performs batch detection via JSONL/CSV lists, with errors in individual samples not interrupting the batch.

5

Section 05

Usage and Output Structure

Usage Workflow (based on Python + uv dependency management): Examples of commands for basic detection (without CodeQL) and enabling CodeQL pre-screening. Key parameters include input (input path), model (LLM model), use-codeql/no-codeql, threshold (determination threshold), and temperature (creativity level). The output structure is clear, including runtime metadata, package-level summary, file list, performance metrics, detailed results of each stage, and exported data (CSV format).

6

Section 06

Practical Significance and Outlook

For Security Teams: Complements existing detection systems (traditional rules capture known threats + LLM discovers new attacks); For Researchers: Full data export facilitates verifying new ideas (replacing models, adjusting strategies, etc.). This tool provides a practical research platform for npm security detection and is an example of transforming academic achievements into engineering practice.

7

Section 07

Conclusion

Software supply chain attacks are becoming increasingly frequent, making npm package security detection crucial. The SocketAI reproduction version demonstrates the application potential of LLM in the security field. Although not a panacea, it provides capabilities that traditional methods are hard to achieve in semantic understanding scenarios. The open-source reproduction not only verifies the original paper's method but also provides the community with a runnable and improvable baseline implementation.