New to neural network and Pytorch.
I have 300 replay memories in each mini batch. I've seen people calculate one loss for the 300 replay memories, but it doesn't really make sense to me. The 300 replay memories are from very different game states and why would it make sense to combine the 300 differences between predictions and targets into one value? Do the gradients get separated into 300 branches, each corresponding to one entry in the mini batch, when the model back propagates?
For example, still using mini batches that have 300 replay memories in each of them. My policy network outputs a probability distribution over 10 actions, or a 300 x 10 tensor, and my target probability distribution is of the same shape. I want to find the cross entropy loss between my predictions and targets. I'm wondering whether I should find 300 cross entropy losses between 300 prediction-target-pairs of size [10] tensors, or find 1 cross entropy loss between 1 prediction-target-pair of size [3000] tensors, if that makes sense. Also how should I implement this in Pytorch? What shape of loss should I expect to get?