what difference does it make to use sigmoid over softmax? (autoencoders, keras)

Question

I came across this problem while I was training an autoencoder neural network (multilayer perceptron). Here is my code

# AE encoding arch
model=Sequential()
model.add(Dense(units= 2000, activation= 'relu', input_shape= (784,)))
model.add(Dense(units= 1200, activation= 'relu'))
model.add(Dense(units= 500, activation= 'relu'))

# latent representation (lower dim representation)
model.add(Dense(units= 10, activation= 'sigmoid')) # mark this activation

# AE decoding arch
model.add(Dense(units= 500, activation= 'relu'))
model.add(Dense(units= 1200, activation= 'relu'))
model.add(Dense(units= 2000, activation= 'relu'))
model.add(Dense(units = 784))

Code above works perfectly. Earlier, I used to use 'softmax' as activation in latent representation dense layer:

model.add(Dense(units= 10, activation= 'softmax')) # mark this activation

Seemed like it had been stuck in local minima. Loss was not going down.

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
60000/60000 [==============================] - 7s 123us/step - loss: 0.0901 - val_loss: 0.0874
Epoch 2/20
60000/60000 [==============================] - 6s 106us/step - loss: 0.0872 - val_loss: 0.0875
Epoch 3/20
60000/60000 [==============================] - 6s 106us/step - loss: 0.0872 - val_loss: 0.0882
Epoch 4/20
60000/60000 [==============================] - 6s 106us/step - loss: 0.0872 - val_loss: 0.0875
Epoch 5/20
60000/60000 [==============================] - 6s 105us/step - loss: 0.0871 - val_loss: 0.0875

What is going on here? Why does sigmoid work but not softmax? Ain't both serve the same purpose?

Sigmoid works on individual activations; softmax works over all activations of the layer. I think what you are trying is at least a very unusual use-case for softmax. — maxy, Apr 28 '19 at 07:38
Hi, thanks maxy. This make sense as sum of values by softmax is equal to 1, but that is not necessary true in the case of sigmoid. — yin yang, Apr 28 '19 at 07:47

what difference does it make to use sigmoid over softmax? (autoencoders, keras)

0 Answers0