7

I am reading this tutorial provided on the home page of Theano documentation

I am not sure about the code given under the gradient descent section.

enter image description here

I have doubts about the for loop.

If you initialize the 'param_update' variable to zero.

param_update = theano.shared(param.get_value()*0., broadcastable=param.broadcastable)

and then you update its value in the remaining two lines.

updates.append((param, param - learning_rate*param_update))
updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))

Why do we need it?

I guess I am getting something wrong here. Can you guys help me!

Cosmo Harrigan
  • 895
  • 1
  • 8
  • 22
Abhishek
  • 3,337
  • 4
  • 32
  • 51
  • What does 'and you **dun** update its value in remaining two line.' mean? – Martin Thoma Aug 18 '14 at 15:45
  • Could you please add code and not a screenshot? – Martin Thoma Aug 18 '14 at 15:46
  • Its under gradient descent section here: http://nbviewer.ipython.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb I meant that u initialize the param_update in first code line I provided, and you dun update in remaining two code line given above. I will try to add code from next time! – Abhishek Aug 18 '14 at 16:03

1 Answers1

15

The initialization of param_update using theano.shared(.) only tells Theano to reserve a variable that will be used by Theano functions. This initialization code is only called once, and will not be used later on to reset the value of param_update to 0.

The actual value of param_update will be updated according to the last line

updates.append((param_update, momentum*param_update + (1. - momentum)*T.grad(cost, param)))

when train function that was constructed by having this update dictionary as an argument ([23] in the tutorial):

train = theano.function([mlp_input, mlp_target], cost,
                        updates=gradient_updates_momentum(cost, mlp.params, learning_rate, momentum))

Each time train is called, Theano will compute the gradient of the cost w.r.t. param and update param_update to a new update direction according to momentum rule. Then, param will be updated by following the update direction saved in param_update with an appropriate learning_rate.

Czechnology
  • 14,832
  • 10
  • 62
  • 88
Kyunghyun Cho
  • 166
  • 1
  • 2