I am reading this tutorial provided on the home page of Theano documentation
I am not sure about the code given under the gradient descent section.
I have doubts about the for loop.
If you initialize the 'param_update' variable to zero.
param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)
and then you update its value in the remaining two lines.
updates.append((param, param - learning_rate*param_update))
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))
Why do we need it?
I guess I am getting something wrong here. Can you guys help me!