I have the following question. I wanted to test the tf.gradients
function so I develop a network with just one neuron with softmax with
W = tf.Variable(tf.zeros([10, 784]), dtype=tf.float32)
b = tf.Variable(tf.zeros([10,1]), dtype=tf.float32)
y_ = tf.nn.softmax(tf.matmul(W,X_)+b)
cost = - tf.reduce_mean(Y * tf.log(y_)+(1-Y) * tf.log(1-y_))
grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)
and then I update the weights with
new_W = W.assign(W - learning_rate_ * (grad_W))
new_b = b.assign(b - learning_rate_ * (grad_b))
now I wanted to compare it with the standard optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate_).minimize(cost)
the funny thing is that the cost function (the same in the two cases) comes out almost the same, but not quite. I was wondering why. Are rounding errors? Or is tf.train.GradientDescentOptimizer
doing something slightly different? To give you an idea here are pairs of values (with gradients and with optimizer)
0.993107 - 0.993085
0.953979 - 0.953984
Anyone can explain it?
EDIT: interesting enough at the beginning (epoch = 0) the two values are the same. It seems like an error that adds to the cost function each epoch...
Thanks, Umberto