I have a softmax layer (only the activation itself, without the linear part of multiplying inputs by weights), and I want to make for it a backward pass.
I have found many tutorials/answers on SO that deals with it, but they all seemed to use X
as (1, n_inputs)
vector. I want to use it as (n_samples, n_inputs)
array, and still, have a correct vectorized implementation of the forward/backward pass.
I have written a following forward pass, normalizing the output for each row/sample (is it correct?):
import numpy as np
X = np.asarray([
[0.0, 0.0],
[0.0, 1.0],
[1.0, 0.0],
[1.0, 1.0]], dtype=np.float32)
def prop(self, X):
s = np.exp(X)
s = s.T / np.sum(s, axis=1)
return s.T
It gives me the final result of forward propagation (including other layers) as:
Y = np.asarray([
[0.5 , 0.5 ],
[0.87070241, 0.12929759],
[0.97738616, 0.02261384],
[0.99200957, 0.00799043]], dtype=np.float32))
So, this is the output of the softmax, if it is correct. Now, how should I write the backward pass?
I have derived the derivative of the softmax to be:
1) if i=j
: p_i*(1 - p_j)
,
2) if i!=j
: -p_i*p_j
,
I've tried to compute the derivative as:
ds = np.diag(Y.flatten()) - np.outer(Y, Y)
But it results in the 8x8 matrix which does not make sense for the following backpropagation... What is the correct way to write it?