1

I am currently turning my Binary Classification Model to a multi-class classification Model. Bare with me.. I am very knew to pytorch and Machine Learning.

Most of what I state here, I know from the following video. https://www.youtube.com/watch?v=7q7E91pHoW4&t=654s

  1. What I read / know is that the CrossEntropyLoss already has the Softmax function implemented, thus my output layer is linear.

  2. What I then read / saw is that I can just choose my Model prediction by taking the torch.max() of my model output (Which comes from my last linear output. This feels weird because I Have some negative outputs and i thought I need to apply the SOftmax function first, but It seems to work right without it.

So know the big confusing question I have is, when would I use the Softmax function? Would I only use it when my loss doesnt have it implemented? BUT then I would choose my prediction based on the outputs of the SOftmax layer which wouldnt be the same as with the linear output layer.

Thank you guys for every answer this gets.

patrick823
  • 131
  • 5

2 Answers2

4

For calculating the loss using CrossEntropy you do not need softmax because CrossEntropy already includes it. However to turn model outputs to probabilities you still need to apply softmax to turn them into probabilities.

Lets say you didnt apply softmax at the end of you model. And trained it with crossentropy. And then you want to evaluate your model with new data and get outputs and use these outputs for classification. At this point you can manually apply softmax to your outputs. And there will be no problem. This is how it is usually done.

Traning()
MODEL ----> FC LAYER --->raw outputs ---> Crossentropy Loss

Eval()
MODEL ----> FC LAYER --->raw outputs --> Softmax -> Probabilites
Enes Kuz
  • 168
  • 7
  • Thanks for your answer. But when doing training you usually track progress aswell, for example accuracy which means you would need to manually apply softmax aswell, right? But then why does in no example the softmax is beeing applied manually? Like in the Vvideo i linked? I understand that the logits output cannot be interpreted nicely and the softmax output are probabilities. – patrick823 Dec 10 '21 at 13:28
  • Check the part 14.50 of the video you sent. He is saying not to apply softmax in the model for loss. – Enes Kuz Dec 11 '21 at 18:10
  • I understand the confusion. But softmax is a simple operation. So you can apply it in different places. There is no one absolute way of writing a code. For example if you dont want to manuelly apply softmax you can add it in the model calculations but when you want to calculate the crossentropy loss you dont take the final output (which is softmaxed output) but you take one before it. – Enes Kuz Dec 11 '21 at 18:14
0

Yes you need to apply softmax on the output layer. When you are doing binary classification you are free to use relu, sigmoid,tanh etc activation function. But when you are doing multi class classification softmax is required because softmax activation function distributes the probability throughout each output node. So that you can easily conclude that the output node which has the highest probability belongs to a particular class. Thank you. Hope this is useful!

  • This does not address the original question. Furthermore, in the same way you describe that softmax is needed for multivariate classification, you could argue that sigmoid is necessary for binary classification. – Christian Jul 06 '22 at 08:40