2

I was following Tensorlflow's Quickstart guide and noticed they discouraged using the softmax function as the activation function in the last layer. The explanation follows:

While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.

Can anyone expand upon this explanation? Everything I have been able to find on the topic recommends using the softmax function in the last layer, counter to Tensorflow's documentation. Has something happened recently that would render this guidance outdated and incorrect now?

Thanks for any insight.

  • 1
    Check [this](https://ai.stackexchange.com/questions/20214/why-does-tensorflow-docs-discourage-using-softmax-as-activation-for-the-last-lay#:~:text=softmax%20in%20as%20the%20activation,when%20using%20a%20softmax%20output.). – Innat Jan 11 '21 at 22:09
  • Thanks, based on that link and digging through some documentation and other posts, I've come up with the following: Best practice is to avoid explicitly adding softmax as the last layer during training when using cross entropy as a loss function. We need to specify from_logits=True and the loss function will automatically apply the softmax during training. Afterwards, we are free to wrap the trained model and attach the softmax at the end so that it returns probabilities to aid in interpretation. Do I have that right? – Bryan Conklin Jan 12 '21 at 16:54
  • It's discouraged and not to use this activation function for all kind of models except classification model. Also, You can check this reference [link](https://www.tensorflow.org/api_docs/python/tf/keras/activations/softmax) for `Softmax` activation function definition , functionality usage with example. –  Oct 19 '21 at 14:42

0 Answers0