-2

Take a simple neural network that takes in data of dimension NxF, and output NxC where the N, F, and C represent number of samples, features, and C output neurons respectively. Needless to say, softmax function with cross-entropy is used given we are dealing with multi-class classification problem. I have some problem with my understanding on how gradients are calculated for backpropagation. I have given below the gradient calculation steps. Could someone please clarify where I am going wrong.

gradient_calculation_part1 gradient_calculation_part2

VM_AI
  • 1,132
  • 4
  • 13
  • 25

1 Answers1

0

I made a miscalculation in computing the gradient of softmax. The dimension is NxC not NxCxC, so everything lines up correctly.

VM_AI
  • 1,132
  • 4
  • 13
  • 25