I have a keras model (CNN with final softmax) that is an RGB image classifier. Output of the model are 5 possible categories for input images (one-hot encoded). I'm trying to generate adversarial images for my keras model with the Cleverhans (tensorflow library).
A simplified version of my code which generates one adversarial image is the following:
# model is the CNN keras model
wrap = KerasModelWrapper(model)
fgsm = FastGradientMethod(wrap, sess=session)
fgsm_params = {'eps': 16. / 256,
'clip_min': 0.,
'clip_max': 1.
}
x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
nchannels))
adv_x = fgsm.generate(x, **fgsm_params)
# original image is a tensor containing only one RGB image, shape=(1,48,48,3)
adv_image = adv_x.eval(session=session, feed_dict={x: original_image})
Chapter 1, eps
From my understanding, 'eps' FGM param is the input variation step (minimum change for one image value/pixel).
I have observed that the final outcome is highly affected by eps, sometimes I need high eps in order to obtain an effective adversarial image, an image which effectively changes the category label in respect to the original image.
With low eps sometimes FGM fails to obtain a working adversarial image i.e., having an image O, with label lO FGM fails to produce adversarial image O' with lO'!= lO, e.g., for lO = [0,0,1,0,0] we still obtain lO' = [0,0,1,0,0], failing to generate an adversarial image with a different label.
Questions (I'm sorry the problem requires a set of questions):
- Does FGM always find out a working adversarial image? i.e., Is it normal that FGM fails?
- Is there a way to obtain an estimated quality of the generated adversarial image (without predicting with model)?
- Why is the value of eps step so important?
- Most important: Is there a way to tell FGM to try harder searching for the adversarial image(e.g, more steps)?
Chapter 2, y,y_target
I have also experimented y and y_target params.
Can you also explain me what are the params 'y'
, 'y_target'
?
I thought 'y_target'
tells that we want to generate an adversarial image that targets a specific category.
For example I thought that y_target = [[0,1,0,0,0]]
in feed_dict
should force to generate an adversarial image which is classified with the 2th class from the model.
- Am I right? ..or
- do I miss something?
P.s: my problem is that setting y_target fails to produce adversarial images.
please give me few tips.. ;-) Regards