# Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

> This thread explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset.

- 板块: [Openclaw Geo](https://www.zingnex.cn/en/forum/board/openclaw-geo)
- 发布时间: 2026-06-11T14:42:58.000Z
- 最近活动: 2026-06-11T14:52:14.599Z
- 热度: 150.8
- 关键词: 对抗鲁棒性, 概率校准, 深度学习, 对抗训练, FGSM, PGD, CIFAR-10, ResNet-18
- 页面链接: https://www.zingnex.cn/en/forum/thread/geo-github-marinajuzgado-adversarial-robustness-and-probabilistic-calibration
- Canonical: https://www.zingnex.cn/forum/thread/geo-github-marinajuzgado-adversarial-robustness-and-probabilistic-calibration
- Markdown 来源: floors_fallback

---

## [Introduction] Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models

**Title: Adversarial Robustness vs. Probabilistic Calibration: A Dilemma for Deep Learning Models**
**Abstract:** This study explores the fundamental trade-off between adversarial robustness and probabilistic calibration in deep neural networks, analyzing the impact of FGSM adversarial training on model accuracy and confidence calibration through experiments on the CIFAR-10 dataset.
**Original Authors & Source:** Marina Juzgado Gómez-Menor et al. (UC3M Neural Networks Course Project), published in June 2026 on the GitHub project Adversarial-robustness-and-probabilistic-calibration.
**Core Insights:** This research reveals the complex relationship between adversarial robustness and probabilistic calibration in deep learning models. FGSM adversarial training can simultaneously improve a model's adversarial robustness and probabilistic calibration under attacks, but at the cost of reduced accuracy on clean data (robustness tax).

## Background: Adversarial Examples and Core Concept Explanations

**Introduction:** Deep learning models perform excellently in image classification and other fields, but adversarial examples (small, imperceptible perturbations to inputs) can cause models to make incorrect predictions. Studies have found a subtle tension between a model's adversarial robustness and probabilistic calibration—models that perform well on clean data may lose both accuracy and calibration under adversarial attacks.

**Core Concept Explanations:**
- **Adversarial Robustness:** The ability of a model to maintain correct predictions when subjected to adversarial attacks. Common attacks include FGSM (single-step attack: x_adv = x + ε·sign(∇_x J(x,y))) and PGD (iterative attack, e.g., PGD-10).
- **Probabilistic Calibration:** Whether the confidence output by the model truly reflects the accuracy of the prediction. Key metrics include Expected Calibration Error (ECE), Negative Log-Likelihood (NLL), and Reliability Diagrams (the diagonal line represents perfect calibration).

## Experimental Design and Methodology

**Experimental Design:**
- **Dataset:** CIFAR-10 (50k training / 10k test images, 10 categories, 32×32 pixels, no data augmentation).
- **Model Architectures:**
  - Small CNN: 4 convolutional layers + 2 fully connected layers, ~2.4 million parameters (lightweight).
  - ResNet-18: Residual network adapted for CIFAR-10, ~11.2 million parameters (deep architecture).
- **Training Strategies:**
  1. Standard Training: Adam optimizer, learning rate 1e-3, cross-entropy loss, 15 epochs.
  2. Adversarial Attack Testing: Apply FGSM/PGD attacks to the baseline model.
  3. FGSM Adversarial Training: Inject FGSM adversarial examples during training (Small CNN: learning rate 1e-3; ResNet-18: 1e-4).
  4. Trade-off Analysis: Evaluate the defense model's performance under PGD attacks and changes in confidence as ε increases.
- **Attack Strength:** ε values {0,4/255,8/255,12/255,16/255}

## Experimental Results and Key Findings

**Quantitative Results Summary:**
| Architecture | Training Method | Clean Accuracy | FGSM Accuracy | PGD Accuracy | Clean ECE | FGSM ECE |
|--------------|-----------------|----------------|---------------|--------------|-----------|----------|
| Small CNN    | Standard Training | 74.96% | 6.20% | 0.29% | 0.1591 | 0.8835 |
| Small CNN    | FGSM Adversarial Training | 62.00% |35.25% |29.13% |0.1084 |0.1152 |
| ResNet-18    | Standard Training |81.55% |1.73% |0.00% |0.1278 |0.9477 |
| ResNet-18    | FGSM Adversarial Training |65.42% |31.52% |25.15% |0.1174 |0.4420 |

**Key Findings:**
1. **Vulnerability of Standard Models:** Standard-trained models experience a sharp drop in accuracy under FGSM/PGD attacks (e.g., ResNet-18's PGD accuracy is 0%).
2. **Adversarial Training Improves Robustness:** After FGSM training, Small CNN's FGSM accuracy increased from 6.20% to 35.25%, and PGD accuracy from 0.29% to 29.13%.
3. **Existence of Robustness Tax:** Adversarial training leads to a decrease in clean data accuracy (Small CNN drops by 13 percentage points, ResNet-18 by 16 percentage points).
4. **Improved Calibration:** Standard models see a surge in ECE under attacks (e.g., ResNet-18's FGSM ECE reaches 0.9477), while adversarial training significantly reduces ECE (Small CNN's FGSM ECE drops to 0.1152).
5. **Model Capacity Does Not Alleviate the Trade-off:** The larger ResNet-18 does not resolve the robustness-accuracy trade-off; the relative improvement is consistent with the Small CNN.

## Research Conclusions

This study clearly reveals the core tension between adversarial robustness and probabilistic calibration in deep learning. FGSM adversarial training not only improves a model's adversarial robustness but also enhances its probabilistic calibration under attacks, though it requires sacrificing clean data accuracy. This finding provides a decision framework for AI system development—making informed choices between safety and performance, and understanding the trade-off is core to responsible AI development.

## Practical Significance and Application Insights

**Industry Insights:**
1. Safety-critical systems (autonomous driving, medical diagnosis) need to adopt adversarial training; although accuracy decreases, it improves reliability under malicious inputs.
2. Downstream decision systems need to dynamically adjust confidence thresholds; adversarial-trained models have more honest confidence, making them suitable for uncertainty-aware decisions.
3. Model selection should align with the scenario: prioritize adversarial-trained models for high-adversarial-risk scenarios, and standard-trained models for low-risk scenarios.

**Researcher Insights:**
1. Calibration and robustness can be achieved together; adversarial training provides new ideas for reliable system design.
2. Explore post-processing methods like temperature scaling to further improve calibration on clean data.
3. Research robustness-oriented architecture designs (e.g., attention mechanisms, improved normalization layers).

## Limitations and Future Research Directions

**Current Limitations:**
1. Only based on the CIFAR-10 dataset; the trade-off may differ for complex datasets (e.g., ImageNet).
2. Only tested FGSM/PGD attacks; advanced attacks (AutoAttack, CW) may reveal new vulnerabilities.

**Future Directions:**
1. Explore stronger adversarial training methods (e.g., PGD training, TRADES).
2. Study the calibration of certified defenses (e.g., random smoothing).
3. Treat robustness, calibration, and accuracy as a multi-objective optimization problem to find Pareto optimal solutions.
4. Research the deployment efficiency (inference speed, memory usage) of adversarial-trained models and their compatibility with compression/quantization techniques.