Take a simple neural network that takes in data of dimension NxF, and output NxC where the N, F, and C represent number of samples, features, and C output neurons respectively. Needless to say, softmax function with cross-entropy is used given we are dealing with multi-class classification problem. I have some problem with my understanding on how gradients are calculated for backpropagation. I have given below the gradient calculation steps. Could someone please clarify where I am going wrong.
Asked
Active
Viewed 331 times
1 Answers
0
I made a miscalculation in computing the gradient of softmax. The dimension is NxC not NxCxC, so everything lines up correctly.

VM_AI
- 1,132
- 4
- 13
- 25