0

I am going to train a neural network (e.g., a feed-forward network) in which the output is just a real value representing a probability (and thus in the [0, 1] interval). Which activation function shall I use for the last layer (i.e., the output node)?

If I don't use any activation functions and just output tf.matmul(last_hidden_layer, weights) + biases it may result in some negative outputs, which is not acceptable, since the outputs are probabilities and thus the prediction should be also a probability. If I use tf.nn.softmax or tf.nn.softplus the model always returns 0 in the test set. Any suggestion?

boomz
  • 657
  • 5
  • 21

1 Answers1

1

The easiest way is to just use the sigmoid activation as output, as this will squash any output range into the [0, 1] range. Then for training you can use either a mean square error or similar loss, or the binary cross entropy. In general the binary cross entropy might work better.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140
  • Thanks Matias. That makes sense, but when I use sigmoid as an activation function for the last node, all the predictions become zero. Any idea? – boomz Dec 01 '16 at 17:00
  • @boomz Not from so little information, I guess there is something wrong in your network or training, I hope you are not initializing weights to zero :) – Dr. Snoopy Dec 01 '16 at 22:38