1

I know there are already multiple similar questions out there, but still don't really understand the derivative of the softmax function. That's how I implemented the softmax function in java:

public double[] activation(double[] input) {
    double[] exp = new double[input.length];
    double sum = 0;
    for(int neuron = 0; neuron < exp.length; neuron++) {
        exp[neuron] = Math.exp(input[neuron]);
        sum += exp[neuron];
    }

    double[] output = new double[input.length]; 
    for(int neuron = 0; neuron < output.length; neuron++) {
        output[neuron] = exp[neuron] / sum;
    }

    return output;
}

And that's what my derivative currently looks like:

public double[] derivative(double[] input) {
    double[] softmax = activation(input);

    double[] output = new double[input.length]; 
    for(int neuron = 0; neuron < output.length; neuron++) {
        output[neuron] = softmax[neuron] * (1d - softmax[neuron]);
    }

    return output;
}

I know that there's still something missing in the derivative, as far as I understood it I need to add a case distinction. I often read something about i==j or i!=j, but I'm not sure what i and j refer to.

I really hope you can help me with understanding what exactly is missing. Thank you!

Jannik
  • 399
  • 2
  • 5
  • 22

0 Answers0