The derivative of Softmax outputs really large shapes

Question

I am creating a basic, and also my first neural network on handwritten digit recognition without any framework (like Tensorflow, PyTorch...) using the Backpropagation algorithm.

My NN has 784 inputs and 10 outputs. So for the last layer, I have to use Softmax.

Because of some memory errors, I have right now my images in shape (300, 784) and my labels in shape (300, 10) After that I am calculating loss from Categorical Cross-entropy. Now we are getting to my problem. In Backpropagation, I need manually compute the first derivative of an activation function. I am doing it like this:

dAl = -(np.divide(Y, Al) - np.divide(1 - Y, 1 - Al))
#Y = test labels
#Al - Activation value from my last layer

And after that my Backpropagation can start, so the last layer is softmax.

def SoftmaxDerivative(dA, Z):
        #Z is an output from np.dot(A_prev, W) + b
              #Where A_prev is an activation value from previous layer
              #W is weight and b is bias
        #dA is the derivative of an activation function value
        x = activation_functions.softmax(dA)
        s = x.reshape(-1,1)
        dZ = np.diagflat(s) - np.dot(s, s.T)
        return dZ

1. Is this function working properly?

In the end, I would like to compute derivatives of weights and biases, So I am using this:

dW = (1/m)*np.dot(dZ, A_prev.T)
#m is A_prev.shape[1] -> 10
db = (1/m)*np.sum(dZ, axis = 1, keepdims = True)

BUT it fails on dW, because dZ.shape is (3000, 3000) (compare to A_prev.shape, which is (300,10)) So from this I assume, that there are only 3 possible outcomes.

My Softmax backward is wrong
dW is wrong
I have some other bug completely somewhere else

Any help would be really appreciated!

hey - when dealing with shape issues ( or asking for help with some ), it's always useful to add the dimensions of all tensors involved :). couple small observations (1) your first derivative seems to not check for 0 divisions, (2) your SoftmaxDerivative() doesn't use the Z input at all. Assuming dA.shape = (300, 10), the correct output shape should be (300, 300) .. so maybe you are multiplying somewhere to end up with dA.shape = (3000, 10) somewhere..? — trdavidson, Dec 02 '19 at 10:27

score 1 · Accepted Answer · answered Dec 03 '19 at 13:05

1

I faced the same problem recently. I'm not sure but maybe this question will help you: Softmax derivative in NumPy approaches 0 (implementation)

answered Dec 03 '19 at 13:05

Denzel

359
4
12

The derivative of Softmax outputs really large shapes

1 Answers1