Reading

Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

This thread explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset.

对抗鲁棒性概率校准深度学习对抗训练FGSMPGDCIFAR-10ResNet-18

Published 2026-06-11 22:42Recent activity 2026-06-11 22:52Estimated read 11 min

Section 01

[Introduction] Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

Title: Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models Abstract: This study explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset. Original Authors & Source: Marina Juzgado Gómez-Menor et al. (UC3M Neural Networks Course Project), published in June 2026 on the GitHub project Adversarial-robustness-and-probabilistic-calibration. Core Insights: This research reveals the complex relationship between adversarial robustness and probabilistic calibration in deep learning models. FGSM adversarial training can simultaneously improve a model's adversarial robustness and probabilistic calibration under attacks, but at the cost of reduced accuracy on clean data (robustness tax).

Section 02

Background: Adversarial Examples and Core Concept Explanations

Introduction: Deep learning models perform excellently in image classification and other fields, but adversarial examples (small, imperceptible perturbations to inputs) can cause models to make incorrect predictions. Studies have found a subtle tension between a model's adversarial robustness and probabilistic calibration—models that perform well on clean data may lose both accuracy and calibration under adversarial attacks.

Core Concept Explanations:

Adversarial Robustness: The ability of a model to maintain correct predictions when subjected to adversarial attacks. Common attacks include FGSM (single-step attack: x_adv = x + ε·sign(∇_x J(x,y))) and PGD (iterative attack, e.g., PGD-10).
Probabilistic Calibration: Whether the confidence output by the model truly reflects the accuracy of the prediction. Key metrics include Expected Calibration Error (ECE), Negative Log-Likelihood (NLL), and Reliability Diagrams (the diagonal line represents perfect calibration).

Section 03

Experimental Design and Methodology

Experimental Design:

Dataset: CIFAR-10 (50k training / 10k test images, 10 categories, 32×32 pixels, no data augmentation).
Model Architectures:
- Small CNN: 4 convolutional layers + 2 fully connected layers, ~2.4 million parameters (lightweight).
- ResNet-18: Residual network adapted for CIFAR-10, ~11.2 million parameters (deep architecture).
Training Strategies:
1. Standard Training: Adam optimizer, learning rate 1e-3, cross-entropy loss, 15 epochs.
2. Adversarial Attack Testing: Apply FGSM/PGD attacks to the baseline model.
3. FGSM Adversarial Training: Inject FGSM adversarial examples during training (Small CNN: learning rate 1e-3; ResNet-18: 1e-4).
4. Trade-off Analysis: Evaluate the defense model's performance under PGD attacks and changes in confidence as ε increases.
Attack Strength: ε values {0,4/255,8/255,12/255,16/255}

Section 04

Experimental Results and Key Findings

Quantitative Results Summary:

Architecture	Training Method	Clean Accuracy	FGSM Accuracy	PGD Accuracy	Clean ECE	FGSM ECE
Small CNN	Standard Training	74.96%	6.20%	0.29%	0.1591	0.8835
Small CNN	FGSM Adversarial Training	62.00%	35.25%	29.13%	0.1084	0.1152
ResNet-18	Standard Training	81.55%	1.73%	0.00%	0.1278	0.9477
ResNet-18	FGSM Adversarial Training	65.42%	31.52%	25.15%	0.1174	0.4420

Key Findings:

Vulnerability of Standard Models: Standard-trained models experience a sharp drop in accuracy under FGSM/PGD attacks (e.g., ResNet-18's PGD accuracy is 0%).
Adversarial Training Improves Robustness: After FGSM training, Small CNN's FGSM accuracy increased from 6.20% to 35.25%, and PGD accuracy from 0.29% to 29.13%.
Existence of Robustness Tax: Adversarial training leads to a decrease in clean data accuracy (Small CNN drops by 13 percentage points, ResNet-18 by 16 percentage points).
Improved Calibration: Standard models see a surge in ECE under attacks (e.g., ResNet-18's FGSM ECE reaches 0.9477), while adversarial training significantly reduces ECE (Small CNN's FGSM ECE drops to 0.1152).
Model Capacity Does Not Alleviate the Trade-off: The larger ResNet-18 does not resolve the robustness-accuracy trade-off; the relative improvement is consistent with the Small CNN.

Section 05

Research Conclusions

This study clearly reveals the core tension between adversarial robustness and probabilistic calibration in deep learning. FGSM adversarial training not only improves a model's adversarial robustness but also enhances its probabilistic calibration under attacks, though it requires sacrificing clean data accuracy. This finding provides a decision framework for AI system development—making informed choices between safety and performance, and understanding the trade-off is core to responsible AI development.

Section 06

Practical Significance and Application Insights

Industry Insights:

Safety-critical systems (autonomous driving, medical diagnosis) need to adopt adversarial training; although accuracy decreases, it improves reliability under malicious inputs.
Downstream decision systems need to dynamically adjust confidence thresholds; adversarial-trained models have more honest confidence, making them suitable for uncertainty-aware decisions.
Model selection should align with the scenario: prioritize adversarial-trained models for high-adversarial-risk scenarios, and standard-trained models for low-risk scenarios.

Researcher Insights:

Calibration and robustness can be achieved together; adversarial training provides new ideas for reliable system design.
Explore post-processing methods like temperature scaling to further improve calibration on clean data.
Research robustness-oriented architecture designs (e.g., attention mechanisms, improved normalization layers).

Section 07

Limitations and Future Research Directions

Current Limitations:

Only based on the CIFAR-10 dataset; the trade-off may differ for complex datasets (e.g., ImageNet).
Only tested FGSM/PGD attacks; advanced attacks (AutoAttack, CW) may reveal new vulnerabilities.

Future Directions:

Explore stronger adversarial training methods (e.g., PGD training, TRADES).
Study the calibration of certified defenses (e.g., random smoothing).
Treat robustness, calibration, and accuracy as a multi-objective optimization problem to find Pareto optimal solutions.
Research the deployment efficiency (inference speed, memory usage) of adversarial-trained models and their compatibility with compression/quantization techniques.

Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

[Introduction] Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

Background: Adversarial Examples and Core Concept Explanations

Experimental Design and Methodology

Experimental Results and Key Findings

Research Conclusions

Practical Significance and Application Insights

Limitations and Future Research Directions

Continue Reading

SignalCut: An Intelligent Tool for Turning AI Search Visibility Gaps into Video Marketing Campaigns

Graph Neural Networks Revolutionize Global Weather Forecasting: From Graph Weather to Open-Source Practice of Multi-Model Fusion

ExoVision: AI-Driven Exoplanet Detection and Habitability Assessment Platform

Vertica Expert Skills: A One-Stop Guide to Enterprise Database Migration and Optimization