Zing Forum

Reading

Janus: A Study on Side-Channel Attacks Against Sparse Attention LLM Inference

The Janus project reveals a new type of security vulnerability introduced by sparse attention mechanisms in large language model (LLM) inference. By analyzing Sparse Induced Memory Access (SIMA) traces, attackers can infer sensitive attributes of user queries and recover the response content generated by the model without accessing model parameters or API outputs.

稀疏注意力侧信道攻击LLM安全隐私保护内存访问SIMA推理安全模型隐私
Published 2026-04-24 00:40Recent activity 2026-04-24 00:51Estimated read 6 min
Janus: A Study on Side-Channel Attacks Against Sparse Attention LLM Inference
1

Section 01

Janus: A Study on Side-Channel Attacks Against Sparse Attention LLM Inference (Introduction)

The Janus project reveals a new type of security vulnerability introduced by sparse attention mechanisms in large language model (LLM) inference. By analyzing Sparse Induced Memory Access (SIMA) traces, attackers can infer sensitive attributes of user queries and recover the response content generated by the model without accessing model parameters or API outputs. This thread will introduce the research background, attack methods, implementation details, security impacts, and defense suggestions in separate floors.

2

Section 02

Research Background

The inference efficiency of large language models is a focus of attention in academia and industry. Sparse attention mechanisms improve efficiency by skipping unimportant attention calculations, but they may introduce new security vulnerabilities. The Janus project conducts a systematic study on this issue and reveals the risk of sparse attention mechanisms being maliciously exploited for side-channel attacks.

3

Section 03

Attack Principles and Types

The core of the Janus attack leverages Sparse Induced Memory Access (SIMA) traces generated by sparse attention: during inference, the model dynamically selects tokens to participate in calculations, leaving memory access traces. Attackers monitor these traces to reconstruct sparse patterns and then infer input features. The attack does not require model parameters, activation values, or API outputs—only hardware-level traces. Main attack types: 1. Query Attribute Inference (QAI) in the prefill phase: analyze prefill SIMA traces and use a pre-trained MLP classifier to infer sensitive attributes (e.g., disease categories in medical queries); 2. Autoregressive Token Recovery (ATR) in the decoding phase: monitor decoding phase traces to gradually recover the response content generated by the model.

4

Section 04

Technical Implementation and Evidence

Janus provides complete attack code, including two modules: 1. Prefill attribute inference module: contains sparse patterns of 20 verification queries, a pre-trained attribute predictor, inference scripts, and result files; 2. Decoding token recovery module: contains sparse patterns of the decoding phase, a pre-trained token predictor, inference scripts, and recovery results. The decoding phase attack verified 20 queries (each generating 300 tokens, vocabulary size 1758) and achieved token-by-token recovery.

5

Section 05

Security Impacts

The Janus attack poses serious privacy risks: 1. Query content inference: can obtain sensitive attributes of users (e.g., medical conditions); 2. Response content recovery: reconstruct the complete response of the model; 3. Multi-tenant risk: users can snoop on each other's inference processes in shared hardware scenarios. Attack scenarios include cloud inference services, edge devices, and Model-as-a-Service (MaaS).

6

Section 06

Defense Suggestions

Against the Janus attack, defense can be implemented at multiple levels: Hardware level: memory access isolation, cache partitioning, constant-time sparse attention algorithms; Software level: sparse pattern obfuscation, memory access randomization, security auditing; Architecture level: Trusted Execution Environment (TEE), homomorphic encryption inference, federated inference.

7

Section 07

Research Summary

The Janus project reveals the potential security risks of sparse attention optimization, reminding us to fully consider security when pursuing inference efficiency. As LLMs are increasingly applied in sensitive fields, understanding and mitigating such side-channel attacks is crucial. It is necessary to find a balance between security and efficiency to ensure the privacy protection capability of LLM services.