gradients evaluation against classical optimizer

Question

I have the following question. I wanted to test the tf.gradients function so I develop a network with just one neuron with softmax with

W = tf.Variable(tf.zeros([10, 784]), dtype=tf.float32)
b = tf.Variable(tf.zeros([10,1]), dtype=tf.float32)

y_ = tf.nn.softmax(tf.matmul(W,X_)+b)
cost = - tf.reduce_mean(Y * tf.log(y_)+(1-Y) * tf.log(1-y_))

grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)

and then I update the weights with

new_W = W.assign(W - learning_rate_ * (grad_W))
new_b = b.assign(b - learning_rate_ * (grad_b))

now I wanted to compare it with the standard optimizer

optimizer = tf.train.GradientDescentOptimizer(learning_rate_).minimize(cost)

the funny thing is that the cost function (the same in the two cases) comes out almost the same, but not quite. I was wondering why. Are rounding errors? Or is tf.train.GradientDescentOptimizer doing something slightly different? To give you an idea here are pairs of values (with gradients and with optimizer)

0.993107 - 0.993085
0.953979 - 0.953984

Anyone can explain it?

EDIT: interesting enough at the beginning (epoch = 0) the two values are the same. It seems like an error that adds to the cost function each epoch...

Thanks, Umberto

Are you using CPU or GPU? According to [this question](https://stackoverflow.com/q/47178371/712995), floating point precision is the only difference. — Maxim, Mar 06 '18 at 16:39
CPU. I am also looking into floating precision... Will post my findings... — Umberto, Mar 06 '18 at 16:46
Apparently the value is the same at the beginning and then starts deviating. There must be slight differences in how tensorflow implement the optimizer than my naive way of doing it. Difference is small but clearly there. I assume that is floating point errors accumulating, although it seems to me that the difference is too big to be simply rounding errors... Oh well... — Umberto, Mar 06 '18 at 17:01
Just a naive question: `X_` are the same for two cases? (and what are those four values you gave us? losses or gradients? maybe these have little importance...) — LI Xuhong, Mar 07 '18 at 14:33
Yes X_ are the same. The four values are the cost function. Left with the gradients, left with the out of the box optimizer... — Umberto, Mar 08 '18 at 07:48

gradients evaluation against classical optimizer

0 Answers0