Implementation of softmax derivative

Asked Mar 09 '20 at 21:10

Active Mar 09 '20 at 21:52

Viewed 341 times

I know there are already multiple similar questions out there, but still don't really understand the derivative of the softmax function. That's how I implemented the softmax function in java:

public double[] activation(double[] input) {
    double[] exp = new double[input.length];
    double sum = 0;
    for(int neuron = 0; neuron < exp.length; neuron++) {
        exp[neuron] = Math.exp(input[neuron]);
        sum += exp[neuron];
    }

    double[] output = new double[input.length]; 
    for(int neuron = 0; neuron < output.length; neuron++) {
        output[neuron] = exp[neuron] / sum;
    }

    return output;
}

And that's what my derivative currently looks like:

public double[] derivative(double[] input) {
    double[] softmax = activation(input);

    double[] output = new double[input.length]; 
    for(int neuron = 0; neuron < output.length; neuron++) {
        output[neuron] = softmax[neuron] * (1d - softmax[neuron]);
    }

    return output;
}

I know that there's still something missing in the derivative, as far as I understood it I need to add a case distinction. I often read something about i==j or i!=j, but I'm not sure what i and j refer to.

I really hope you can help me with understanding what exactly is missing. Thank you!

edited Mar 09 '20 at 21:52

asked Mar 09 '20 at 21:10

Jannik

here you have a python example to look at https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ – Jocke Mar 09 '20 at 21:36
Thanks, but what exactly are j and i in this example? – Jannik Mar 09 '20 at 21:50

Implementation of softmax derivative

0 Answers0