Cleverhans : Adversarial Images - classification accuracy is too high

Question

What is going wrong with this code? I have generated adversarial images using cleverhans API - generate_np method. And using the default cleverhans CNN classifier to classify the images. The test accuracy is very low as expected when I use the model after generating the images. But if I save and reload the model, the accuracy is too high. Please check the code here.

https://github.com/csesivakumar/Adversarial_Defense/blob/master/Cleverhans_generatenp.ipynb

Python: 3.6

score 0 · Answer 1 · answered Jun 09 '19 at 00:22

Pasting my answer from the GitHub issue tracker in case others are facing the same issue:

From your code it looks like you are initializing the model's weights, defining the tf session, etc... after having trained the model using Keras. My guess is that the adv_x array does not contain images that are adversarial. This would explain why the accuracy output by [22] is close to random---because the model weights are random. When you restore the model, its weights are set again to the values learned during training so the accuracy is restored (because the images are not adversarial).

Cleverhans : Adversarial Images - classification accuracy is too high

1 Answers1