People | MIT CSAILpeople.csail.mit.edu/tsipras/bugs_poster.pdf · 2019-10-28 · Title: Adversarial...
Transcript of People | MIT CSAILpeople.csail.mit.edu/tsipras/bugs_poster.pdf · 2019-10-28 · Title: Adversarial...
Adversarial Examples Are Not Bugs, They Are FeaturesAndrew Ilyas*, Shibani Santurkar*, Dimitris Tsipras*, Logan Engstrom*, Brandon Tran and Aleksander Madry
Massachusetts Institute of Technology madry-lab.ml
Adversarial Examples: A Challenge for ML Systems
Why are ML models so sensitive to small perturbations?
Prevailing theme: They stem from bugs/aberrations
A Simple Experiment
Adversarial perturbation towards “cat”
1. Make adversarial example towards the other class 2. Relabel the image as the target class 3. Train with new dataset but test on the original test set
dog
Training set
dog
A Simple Experiment
catdogcat
New training setTest set
dog cat
car ship
Train
So: We train on a totally “mislabeled” dataset butexpect performance on a “correct” dataset
Training DataDataset
CIFAR-10 ImageNetRStandard Dataset 95.3% 96.6%“Mislabeled” Dataset 43.7% 64.4%
Result: nontrivial accuracy on the original task
The Robust Features Model
From “max accuracy” view: All features are goodIf NRFs are (often) good: Models want to use them
Thus: Models use NRFs → adversarial examples
Adversarial example towards “cat” dog
Training set
dogcat
dog
Robust features: dog Non-robust features: dog
Robust features: dog Non-robust features: cat
The Simple Experiment: A Second Look
New training set
cat
RFs misleading but NRFs suffice for generalization
Directly Manipulating Features“Robust” Data: Standard training → robust models
Robust Optimization: Makes NRFs useless for learning→ Need more data to learn from only RFs (cf. [Schmidt et al., 2018])→ Trade-off between robustness/accuracy (cf. [Tsipras et al., 2019])
Implications
ML models do not work the way we expect them to
Adversarial examples: A "human-based" phenomenon?
Transfer Attacks: Models rely on similar NRFs
25 30 35 40 45 50Test accuracy (%; trained on Dy + 1)
60
70
80
90
100
Tran
sfer
succ
ess r
ate
(%)
VGG-16
Inception-v3
ResNet-18 DenseNet
ResNet-50
Interpretability: May need to be enforced at training time
A Theoretical Framework→ We consider (robust) MLE classification between Gaussians→ Vulnerability is misalignment between the data geometry andadversary’s (`2) geometry→ Shows that robust optimization better aligns these geometries
20 15 10 5 0 5 10 15 20Feature x1
10.0
7.5
5.0
2.5
0.0
2.5
5.0
7.5
10.0
Feat
ure
x 2
Maximum likelihood estimate2 unit ball
1-induced metric unit ballSamples from (0, )
20 15 10 5 0 5 10 15 20Feature x1
10.0
7.5
5.0
2.5
0.0
2.5
5.0
7.5
10.0
Feat
ure
x 2
True Parameters ( = 0)Samples from ( , )Samples from ( , )
20 15 10 5 0 5 10 15 20Feature x1
10.0
7.5
5.0
2.5
0.0
2.5
5.0
7.5
10.0
Feat
ure
x 2
Robust parameters, = 1.0
20 15 10 5 0 5 10 15 20Feature x1
10.0
7.5
5.0
2.5
0.0
2.5
5.0
7.5
10.0
Feat
ure
x 2
Robust parameters, = 10.0
Moving Forward→ Do we want our models to rely on NRFs?→ How should we think of interpretability?
Robustness as a goal beyond security/reliability?
Paper
arxiv:1905.02175
Blog
gradsci.org/adv
Python Library
MadryLab/robustness