Zing Forum

Reading

Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

This thread explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset.

对抗鲁棒性概率校准深度学习对抗训练FGSMPGDCIFAR-10ResNet-18
Published 2026-06-11 22:42Recent activity 2026-06-11 22:52Estimated read 11 min
Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models
1

Section 01

[Introduction] Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

Title: Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models Abstract: This study explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset. Original Authors & Source: Marina Juzgado Gómez-Menor et al. (UC3M Neural Networks Course Project), published in June 2026 on the GitHub project Adversarial-robustness-and-probabilistic-calibration. Core Insights: This research reveals the complex relationship between adversarial robustness and probabilistic calibration in deep learning models. FGSM adversarial training can simultaneously improve a model's adversarial robustness and probabilistic calibration under attacks, but at the cost of reduced accuracy on clean data (robustness tax).

2

Section 02

Background: Adversarial Examples and Core Concept Explanations

Introduction: Deep learning models perform excellently in image classification and other fields, but adversarial examples (small, imperceptible perturbations to inputs) can cause models to make incorrect predictions. Studies have found a subtle tension between a model's adversarial robustness and probabilistic calibration—models that perform well on clean data may lose both accuracy and calibration under adversarial attacks.

Core Concept Explanations:

  • Adversarial Robustness: The ability of a model to maintain correct predictions when subjected to adversarial attacks. Common attacks include FGSM (single-step attack: x_adv = x + ε·sign(∇_x J(x,y))) and PGD (iterative attack, e.g., PGD-10).
  • Probabilistic Calibration: Whether the confidence output by the model truly reflects the accuracy of the prediction. Key metrics include Expected Calibration Error (ECE), Negative Log-Likelihood (NLL), and Reliability Diagrams (the diagonal line represents perfect calibration).
3

Section 03

Experimental Design and Methodology

Experimental Design:

  • Dataset: CIFAR-10 (50k training / 10k test images, 10 categories, 32×32 pixels, no data augmentation).
  • Model Architectures:
    • Small CNN: 4 convolutional layers + 2 fully connected layers, ~2.4 million parameters (lightweight).
    • ResNet-18: Residual network adapted for CIFAR-10, ~11.2 million parameters (deep architecture).
  • Training Strategies:
    1. Standard Training: Adam optimizer, learning rate 1e-3, cross-entropy loss, 15 epochs.
    2. Adversarial Attack Testing: Apply FGSM/PGD attacks to the baseline model.
    3. FGSM Adversarial Training: Inject FGSM adversarial examples during training (Small CNN: learning rate 1e-3; ResNet-18: 1e-4).
    4. Trade-off Analysis: Evaluate the defense model's performance under PGD attacks and changes in confidence as ε increases.
  • Attack Strength: ε values {0,4/255,8/255,12/255,16/255}
4

Section 04

Experimental Results and Key Findings

Quantitative Results Summary:

Architecture Training Method Clean Accuracy FGSM Accuracy PGD Accuracy Clean ECE FGSM ECE
Small CNN Standard Training 74.96% 6.20% 0.29% 0.1591 0.8835
Small CNN FGSM Adversarial Training 62.00% 35.25% 29.13% 0.1084 0.1152
ResNet-18 Standard Training 81.55% 1.73% 0.00% 0.1278 0.9477
ResNet-18 FGSM Adversarial Training 65.42% 31.52% 25.15% 0.1174 0.4420

Key Findings:

  1. Vulnerability of Standard Models: Standard-trained models experience a sharp drop in accuracy under FGSM/PGD attacks (e.g., ResNet-18's PGD accuracy is 0%).
  2. Adversarial Training Improves Robustness: After FGSM training, Small CNN's FGSM accuracy increased from 6.20% to 35.25%, and PGD accuracy from 0.29% to 29.13%.
  3. Existence of Robustness Tax: Adversarial training leads to a decrease in clean data accuracy (Small CNN drops by 13 percentage points, ResNet-18 by 16 percentage points).
  4. Improved Calibration: Standard models see a surge in ECE under attacks (e.g., ResNet-18's FGSM ECE reaches 0.9477), while adversarial training significantly reduces ECE (Small CNN's FGSM ECE drops to 0.1152).
  5. Model Capacity Does Not Alleviate the Trade-off: The larger ResNet-18 does not resolve the robustness-accuracy trade-off; the relative improvement is consistent with the Small CNN.
5

Section 05

Research Conclusions

This study clearly reveals the core tension between adversarial robustness and probabilistic calibration in deep learning. FGSM adversarial training not only improves a model's adversarial robustness but also enhances its probabilistic calibration under attacks, though it requires sacrificing clean data accuracy. This finding provides a decision framework for AI system development—making informed choices between safety and performance, and understanding the trade-off is core to responsible AI development.

6

Section 06

Practical Significance and Application Insights

Industry Insights:

  1. Safety-critical systems (autonomous driving, medical diagnosis) need to adopt adversarial training; although accuracy decreases, it improves reliability under malicious inputs.
  2. Downstream decision systems need to dynamically adjust confidence thresholds; adversarial-trained models have more honest confidence, making them suitable for uncertainty-aware decisions.
  3. Model selection should align with the scenario: prioritize adversarial-trained models for high-adversarial-risk scenarios, and standard-trained models for low-risk scenarios.

Researcher Insights:

  1. Calibration and robustness can be achieved together; adversarial training provides new ideas for reliable system design.
  2. Explore post-processing methods like temperature scaling to further improve calibration on clean data.
  3. Research robustness-oriented architecture designs (e.g., attention mechanisms, improved normalization layers).
7

Section 07

Limitations and Future Research Directions

Current Limitations:

  1. Only based on the CIFAR-10 dataset; the trade-off may differ for complex datasets (e.g., ImageNet).
  2. Only tested FGSM/PGD attacks; advanced attacks (AutoAttack, CW) may reveal new vulnerabilities.

Future Directions:

  1. Explore stronger adversarial training methods (e.g., PGD training, TRADES).
  2. Study the calibration of certified defenses (e.g., random smoothing).
  3. Treat robustness, calibration, and accuracy as a multi-objective optimization problem to find Pareto optimal solutions.
  4. Research the deployment efficiency (inference speed, memory usage) of adversarial-trained models and their compatibility with compression/quantization techniques.