I was following Tensorlflow's Quickstart guide and noticed they discouraged using the softmax function as the activation function in the last layer. The explanation follows:
While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.
Can anyone expand upon this explanation? Everything I have been able to find on the topic recommends using the softmax function in the last layer, counter to Tensorflow's documentation. Has something happened recently that would render this guidance outdated and incorrect now?
Thanks for any insight.