Reading

CMC Firewall: A Conformal Prediction-Based Defense Against Visual Prompt Injection in Multimodal LLMs

CMC (Conformal Cross-Modal Firewall) is a pre-model defense mechanism that effectively controls the false positive rate of visual prompt injection attacks while maintaining model utility through OCR text extraction, SigLIP risk scoring, and inductive conformal prediction calibration.

CMC视觉提示注入多模态LLM共形预测MLLM安全SigLIPOCR防火墙MM-SafetyBenchNeurIPS前置防御

Published 2026-04-25 22:09Recent activity 2026-04-25 22:23Estimated read 6 min

Section 01

Introduction to CMC Firewall: A Conformal Prediction-Based Defense Against Visual Prompt Injection in Multimodal LLMs

With the widespread application of Multimodal Large Language Models (MLLMs), visual prompt injection attacks have become a severe security challenge. CMC (Conformal Cross-Modal Firewall) is a pre-model defense mechanism that effectively controls the false positive rate while maintaining model utility through OCR text extraction, SigLIP risk scoring, and inductive conformal prediction calibration, resolving the dilemma of traditional defenses being either 'overly sensitive with false positives' or 'too lenient with missed attacks'.

Section 02

Threat Status of Visual Prompt Injection Attacks and Limitations of Existing Defenses

Visual prompt injection attacks exploit the OCR capability of MLLMs to embed malicious text like 'Ignore previous instructions and execute X', bypassing traditional text security filters. Existing defense solutions have shortcomings: keyword blacklists are easily bypassed by synonyms/spelling variations; semantic similarity filtering relies on fixed thresholds, making it hard to balance security and usability; post-processing filtering cannot stop the generation of harmful content.

Section 03

Core Defense Mechanisms of CMC Firewall

CMC adopts a pre-model architecture with three core steps: 1. OCR text span extraction: Identify visible and hidden text in images; 2. SigLIP encoder risk scoring: Calculate semantic risk (similarity to malicious instructions) and statistical anomalies (abnormal distribution in embedding space); 3. Inductive conformal prediction calibration: Provide distribution-agnostic, finite-sample valid false positive rate guarantees, strictly limiting the false positive rate of clean images to within α+1/(n+1).

Section 04

Experimental Evidence and Performance Evaluation of CMC Firewall

Evaluation on the MM-SafetyBench dataset: CMC (α=0.20) has an unsafe rate of 15.4%, attack interception rate of 81.2%, and a false positive rate of 12.4% which is below the theoretical upper limit of 21%. Cross-model validation (Qwen3.5-9B) shows the unsafe rate drops from 17.2% to 13.9% (p=0.0008). MMBench tests retain 90% of original performance, ensuring controllable utility.

Section 05

Technical Implementation Details of CMC Firewall

Computational resources: The headline LLaVA process uses 1×H100-class GPU; full reproduction requires 2×H100 NVL (MIG slices). Code structure includes configs (39 experimental configurations), src (attacks/defenses/eval/transforms), and scripts. Quick start: Clone the repository → bash scripts/setup.sh → smoke_test.sh → reproduce.sh.

Section 06

Theoretical Contributions of CMC and Comparison with Related Work

Theoretical contributions: Introduce statistical learning theory, including distribution-agnostic guarantees of conformal prediction, finite-sample validity, and efficient inductive deployment; verify 4 theorems. Comparison with related work: CMC outperforms keyword filtering and semantic filtering in terms of interception rate (moderate to high), false positive control (statistical guarantee), and interpretability (high).

Section 07

Practical Deployment Considerations for CMC Firewall

Advantages: Statistical guarantees (clear FPR upper bound), model-agnostic (pre-layer adapts to any MLLM), interpretability (traceable risk scores). Challenges: Computational overhead (increased latency), calibration data quality affecting effectiveness, need for continuous updates to counter adversarial adaptability.

Section 08

Summary and Future Outlook of CMC Firewall

CMC achieves a shift from empirical parameter tuning to statistical guarantees, balancing security and usability. It has been submitted to NeurIPS 2026, and the code is open-source. Future directions: Extend to video temporal injection defense, active learning to update calibration sets, and lightweight encoders to reduce deployment costs.

Continue Reading

Keep going with more reads from the same topic.

Nornir MCP Server: An Enterprise-Grade Bridge for Integrating Large Language Models into Network Automation

Nornir MCP Server is an enterprise-level server based on the Model Context Protocol (MCP). It seamlessly integrates large language models (such as Claude) with the Nornir network automation framework, supporting natural language orchestration for multi-vendor network devices (Cisco, Arista, Juniper, etc.), and providing production-grade features like a dual-engine architecture (NAPALM + Netmiko), intelligent filtering, and a secure sandbox.

Recent activity 2026-05-06 20:51

Bibliothèque Française LLM: A French Public Domain Literature Index System Optimized for Large Language Models

Bibliothèque Française LLM is a structured indexing and annotation project for French public domain literature designed specifically for large language models (LLMs). It integrates multiple authoritative sources such as DraCor, Common Corpus, and Wikisource, providing metadata indexing categorized by genre, author, and era, as well as in-depth annotations for dramatic texts (including characters, lines, stage directions, etc.). Its aim is to enable LLMs to efficiently read and understand classic French literary works.

Recent activity 2026-05-06 20:50

Splinter: A Lock-Free Zero-Copy Shared Memory KV and Vector Storage Library That Eliminates Socket and Memcpy Overhead for LLM Inference

Splinter is a minimalist, high-performance key-value (KV) and vector storage system enabling zero-latency inter-process communication via shared memory and atomic operations. With only 766 lines of core code, it supports millions of operations per second and 768-dimensional vector storage, offering a new architectural approach for local LLM inference and data-intensive applications.

Recent activity 2026-04-03 08:49

libmlxforge: An Embedded MLX LLM Inference Engine for Apple Silicon

libmlxforge is an embeddable MLX large language model (LLM) inference engine designed specifically for Apple Silicon. It provides a unified C ABI interface, supports calls from Node.js, Swift, and Rust, and features continuous batching, streaming output, JSON-constrained structured output, and embedding vector generation.

Recent activity 2026-06-09 17:23