I’m currently trying to implement Deepmask (Link to FAIR's Paper) using Pytorch, so far I have defined the Joint Loss Function, and the model’s learn-able parameters and the forward pass.
I was working on the training phase, and as the paper says that training must be done in an alternative back-propagation fashion across the two branches, I have written the code for the same.
But there is some problem with training, I tried to train the model with a fake data-set (a randomly generated data-set), for minibatches other than the first mini-batch the loss of the model is turning out to be nan.
What could be the reason for this nan loss?