For some purpose, I'm trying to do gradient descent on a randomly initialized vector (numpy array) as input while given a simple model.
This is the summary of my model:
And this is the algorithm I'm trying to realize (paper linked below):
Which is simply doing gradient descent on a random input (with shape (512,)
in this case) minimizing a custom loss (which is the square of a certain output neuron).
The idea is quite simple, but I'm having a hard time implementing it.
The following are functions I looked up and tried, but the output turns out to be a list of (512, 512)
arrays instead of one (512,)
. Also, tf.reduce_mean()
is not what I intended to use, but it's relatively a small problem here.
def loss_fn(model, inputs, targets):
error = model(inputs) - targets
return tf.reduce_mean(tf.square(error))
def gradients(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss_fn(model, inputs, targets)
return (tape.gradient(loss_value, model.trainable_variables), loss_value)
for e in range(epochs):
gradient, loss = gradients(model, x, y)
x = x - learning_rate * gradient
print('epoch', e, 'loss', loss)
Can anyone point out which part I'm doing wrong?
I assume the shapes of tensors are all messed up here, but I really have no clue where and how to start fixing it.
Sorry for this naive question, I hope I described it well though. Thanks in advance.
Paper: Trojaning Attack on Neural Networks
Edit: Apparently I did not explain well enough.
The problem is here:
gradient, loss = gradients(model, x, y)
gradients()
isn't giving the expected results.
Expected: return int, np.array(shape=(512,))
on parameters model, np.array(shape=(512,)) np.array(shape=(10,))
What I got:
ValueError: Input 0 of layer dense_5 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (512,)