People | MIT CSAILpeople.csail.mit.edu/tsipras/bugs_poster.pdf · 2019-10-28 · Title: Adversarial...

Post on 14-Aug-2020

2 views 0 download

Transcript of People | MIT CSAILpeople.csail.mit.edu/tsipras/bugs_poster.pdf · 2019-10-28 · Title: Adversarial...

Adversarial Examples Are Not Bugs, They Are FeaturesAndrew Ilyas*, Shibani Santurkar*, Dimitris Tsipras*, Logan Engstrom*, Brandon Tran and Aleksander Madry

Massachusetts Institute of Technology madry-lab.ml

Adversarial Examples: A Challenge for ML Systems

Why are ML models so sensitive to small perturbations?

Prevailing theme: They stem from bugs/aberrations

A Simple Experiment

Adversarial perturbation towards “cat”

1. Make adversarial example towards the other class 2. Relabel the image as the target class 3. Train with new dataset but test on the original test set

dog

Training set

dog

A Simple Experiment

catdogcat

New training setTest set

dog cat

car ship

Train

So: We train on a totally “mislabeled” dataset butexpect performance on a “correct” dataset

Training DataDataset

CIFAR-10 ImageNetRStandard Dataset 95.3% 96.6%“Mislabeled” Dataset 43.7% 64.4%

Result: nontrivial accuracy on the original task

The Robust Features Model

From “max accuracy” view: All features are goodIf NRFs are (often) good: Models want to use them

Thus: Models use NRFs → adversarial examples

Adversarial example towards “cat” dog

Training set

dogcat

dog

Robust features: dog Non-robust features: dog

Robust features: dog Non-robust features: cat

The Simple Experiment: A Second Look

New training set

cat

RFs misleading but NRFs suffice for generalization

Directly Manipulating Features“Robust” Data: Standard training → robust models

Robust Optimization: Makes NRFs useless for learning→ Need more data to learn from only RFs (cf. [Schmidt et al., 2018])→ Trade-off between robustness/accuracy (cf. [Tsipras et al., 2019])

Implications

ML models do not work the way we expect them to

Adversarial examples: A "human-based" phenomenon?

Transfer Attacks: Models rely on similar NRFs

25 30 35 40 45 50Test accuracy (%; trained on Dy + 1)

60

70

80

90

100

Tran

sfer

succ

ess r

ate

(%)

VGG-16

Inception-v3

ResNet-18 DenseNet

ResNet-50

Interpretability: May need to be enforced at training time

A Theoretical Framework→ We consider (robust) MLE classification between Gaussians→ Vulnerability is misalignment between the data geometry andadversary’s (`2) geometry→ Shows that robust optimization better aligns these geometries

20 15 10 5 0 5 10 15 20Feature x1

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

10.0

Feat

ure

x 2

Maximum likelihood estimate2 unit ball

1-induced metric unit ballSamples from (0, )

20 15 10 5 0 5 10 15 20Feature x1

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

10.0

Feat

ure

x 2

True Parameters ( = 0)Samples from ( , )Samples from ( , )

20 15 10 5 0 5 10 15 20Feature x1

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

10.0

Feat

ure

x 2

Robust parameters, = 1.0

20 15 10 5 0 5 10 15 20Feature x1

10.0

7.5

5.0

2.5

0.0

2.5

5.0

7.5

10.0

Feat

ure

x 2

Robust parameters, = 10.0

Moving Forward→ Do we want our models to rely on NRFs?→ How should we think of interpretability?

Robustness as a goal beyond security/reliability?

Paper

arxiv:1905.02175

Blog

gradsci.org/adv

Python Library

MadryLab/robustness