Can you describe how to apply SoftMax derivatives in generic terms for C++?

Question

This question is regarding Softmax’s derivatives.

I looked around for SoftMax function and found this useful language agnostic resource:

Which Translates to C/C++ nicely.

void TransformToSoftMax(DoubleListType &inputs, DoubleListType &outputs, int NumberOfNeurons)
{
  double sum = 0.0;
  double maxvalue;

  maxvalue = inputs[0];
  for (int i = 0; i < NumberOfNeurons; i++)
      maxvalue = max(inputs[i], maxvalue);
  for (int i = 0; i < NumberOfNeurons; i++) 
      sum += exp(inputs[i])
  for (int i = 0; i < NumberOfNeurons; i++)
      outputs[i] = exp(inputs[i] - maxvalue) / sum;
}

Unfortunately, I don't have the derivative. The source didn’t make one up for its derivative. I’m finding some screwy results internet search, like this from SIMD Library documentation online. I know it has to be wrong.

I found many examples, Thick in Python code, mentioning, vectors, matrix with almost NO mention of the “Network or Neurons itself”, almost requiring a person to learn Python and write the code as the responder had to see things like “what was passed” to the example and why.

Is it even possible to explain the derivative in clear steps with just the mention of Neurons, Layer, network (Like the pictured description of applying SoftMax) or are matrices, vectors, “np”(s) the only way to describe it? If so, please give a quick "This is what you have to do".

it is definitely what you *should* do. Thinking of NN in terms of actual neurons and some information being pushed around is a cute intuition from years ago, but for practical reasons it is an extremely limiting one, that does not really scale to modern sizes/more complex architectures well. — lejlot, Aug 16 '23 at 21:09
@lejlot - Thank you sir for responding. But here's 3 points; 1) Not everyone has created NN in exactly the same way. Therefore params passed in one person's designs procedure I'm looking at now are not the same as a few I have been looking at. 2) C++ is not python and there are dramatic differences in designs when coding. I really don't need another language under my belt. 3) A C# version only had the same as the SIMD; df(y) = y * (1 - y), which I know is wrong. HENCE my request for a more "just give me the Concepts" request, like the slide for Applying SoftMax. — Adrian E, Aug 16 '23 at 23:03

score 1 · Answer 1 · answered Aug 17 '23 at 15:27

The problem here is your assumed "the derivative." SoftMax does not have one derivative, because it has multiple inputs. If you look at all the other actication functions, you see that they're defined as simple scalar functions of x. Hence, the derivative is in that case simply the derivative df/dx.

The MLDawn page mentioned in the comment shows 9 derivatives given three neurons. That's a clear proof that the derivative does not exist. So, yes, you need something like a matrix to represent the 3x3 derivatives.

Side note: I am getting the impression that your understanding of neural networks is rather unusual. You're looking for "a derivative", so presumably you are doing something which requires that. The one application I know is back-propagation, but that requires a much deeper understanding of how learning in neural networks happens. This makes answering your question hard - you obviously have gaps in your knowledge, that's why you are asking questions, but it is quite unclear what you do understand.

Can you describe how to apply SoftMax derivatives in generic terms for C++?

1 Answers1