I know there are already multiple similar questions out there, but still don't really understand the derivative of the softmax function. That's how I implemented the softmax function in java:
public double[] activation(double[] input) {
double[] exp = new double[input.length];
double sum = 0;
for(int neuron = 0; neuron < exp.length; neuron++) {
exp[neuron] = Math.exp(input[neuron]);
sum += exp[neuron];
}
double[] output = new double[input.length];
for(int neuron = 0; neuron < output.length; neuron++) {
output[neuron] = exp[neuron] / sum;
}
return output;
}
And that's what my derivative currently looks like:
public double[] derivative(double[] input) {
double[] softmax = activation(input);
double[] output = new double[input.length];
for(int neuron = 0; neuron < output.length; neuron++) {
output[neuron] = softmax[neuron] * (1d - softmax[neuron]);
}
return output;
}
I know that there's still something missing in the derivative, as far as I understood it I need to add a case distinction. I often read something about i==j
or i!=j
, but I'm not sure what i
and j
refer to.
I really hope you can help me with understanding what exactly is missing. Thank you!