0

When training a model using OpenNMT-py, we get a dict as output, containing the weights and biases of the network. However, these tensors have requires_grad = False, and so, do not have a gradient. For example. with one layer, we might have the following tensors, denoting embeddings as well as weights and biases in the encoder and decoder. None of them have a gradient attribute.

encoder.embeddings.emb_luts.0.weight

decoder.embeddings.emb_luts.0.weight

encoder.rnn.weight_ih_l0

encoder.rnn.weight_hh_l0

encoder.rnn.bias_ih_l0

encoder.rnn.bias_hh_l0

decoder.rnn.layers.0.weight_ih

decoder.rnn.layers.0.weight_hh

decoder.rnn.layers.0.bias_ih

decoder.rnn.layers.0.bias_hh

Can OpenNMT-py be made to set requires_gradient = True with some option I have not found or is there some other way to obtain the gradient of these tensors?

thaumoctopus
  • 113
  • 1
  • 13
  • Gradients with respect to what? The gradients are only defined with respect to a particular training batch, so they are discarded after training. Moreover, during training, the gradients get zeroed after each step. – Jindřich Jun 05 '19 at 09:43
  • With respect to each training batch, yes. I guess that this information is not retained then. Do you have experience with OpenNMT-py and where in that code, for example I would add a tensorboardX writer to track things like the gradient? – thaumoctopus Jun 05 '19 at 11:05

1 Answers1

0

The gradients are accessible only inside the training loop, where optim.step() is called. If you want to log the gradients (or norm of gradients or whatever) to TensorBoard, you can probably best get them before the optimizer step is called. It happens in the _gradient_accumulation method of the Trainer object.

Be aware that there are two places where optim.step() is called. Which one is used depends on whether you do the update after every batch or whether you accumulate gradient from multiple batches and do the update afterward.

Jindřich
  • 10,270
  • 2
  • 23
  • 44
  • Thank you so much. I am sorry for bothering further, but would you happen to know where to look for the hidden states in the RNN in the code? – thaumoctopus Jun 06 '19 at 11:03