What is the difference between keras.activations.softmax and keras.layers.Softmax?

Question

What is the difference between keras.activations.softmax and keras.layers.Softmax? Why are there two definitions of the same activation function?

keras.activations.softmax: https://keras.io/activations/

keras.layers.Softmax: https://keras.io/layers/advanced-activations/

today · Accepted Answer · 2018-11-27T16:59:52.233

They are equivalent to each other in terms of what they do. Actually, the Softmax layer would call the activations.softmax under the hood:

def call(self, inputs):
    return activations.softmax(inputs, axis=self.axis)

However, their difference is that the Softmax layer could be directly used as a layer:

from keras.layers import Softmax

soft_out = Softmax()(input_tensor)

But, activations.softmax could not be used directly as a layer. Rather, you can pass it as the activation function of other layers through activation argument:

from keras import activations

dense_out = Dense(n_units, activation=activations.softmax)

Further, note that the good thing about using Softmax layer is that it takes an axis argument and you can compute the softmax over another axis of the input instead of its last axis (which is the default):

soft_out = Softmax(axis=desired_axis)(input_tensor)

But Softmax layer cannot specify precision, which means that if I use mixed precision and a Softmax layer as the output layer, I cannot convert float16 back into float32 that is request by tf in order to preserve accuracy in loss (https://www.tensorflow.org/guide/keras/mixed_precision#training_the_model_with_a_custom_training_loop). Any work-around? Thanks. — Theron, Jan 04 '20 at 02:03

What is the difference between keras.activations.softmax and keras.layers.Softmax?

1 Answers1