-1

I am learning how to differentiate the softmax function, I am using the article: https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1 enter image description here

So in the example the (4x1) matrix is turned into the (4x4) Jacobian matrix, but now how am I meant to use this in back propagation using the error formula of (first rule) when the shape is not the same for the Hadamard product: enter image description here

Assuming I am using the mean squared error, delta C w.r.t a is a 4x1 vector but I cant use the Hadamard product with a 4x4 matrix.

I have read that I may have to only use the diagonal or something but I still don't understand why or if that is the right thing to do.

Subeen Regmi
  • 44
  • 1
  • 5

0 Answers0