I am reading about adversarial images and breaking neural networks. I am trying to work through the article step-by-step but do to my inexperience I am having a hard time trying to understand the following instructions.
At the moment, I have a logistic regression model for the MNIST
data set. If you give an image, it will predict the number that it most likely is...
saver.restore(sess, "/tmp/model.ckpt")
# image of number 7
x_in = np.expand_dims(mnist.test.images[0], axis=0)
classification = sess.run(tf.argmax(pred, 1), feed_dict={x:x_in})
print(classification)
Now, the article states that in order to break this image, the first thing we need to do is get the gradient of the neural network. In other words, this will tell me the direction needed to make the image look more like a number 2 or 3, even though it is a 7.
The article states that this is relatively simple to do using back propagation
. So you may define a function...
compute_gradient(image, intended_label)
...and this basically tells us what kind of shape the neural network is looking for at that point.
This may seem easy to implement to those more experienced but the logic evades me.
From the parameters of the function compute_gradient
, I can see that you feed it an image and an array of labels where the value of the intended label is set to 1.
But I do not see how this is supposed to return the shape of the neural network.
Anyways, I want to understand how I should implement this back propagation
algorithm to return the gradient of the neural network. If the answer is not very straightforward, I would like some step-by-step instructions as to how I may get my back propagation
to work as the article suggests it should.
In other words, I do not need someone to just give me some code that I can copy but I want to understand how I may implement it as well.