Training a TensorFlow model for regression when labels are probabilities

Question

I am going to train a neural network (e.g., a feed-forward network) in which the output is just a real value representing a probability (and thus in the [0, 1] interval). Which activation function shall I use for the last layer (i.e., the output node)?

If I don't use any activation functions and just output tf.matmul(last_hidden_layer, weights) + biases it may result in some negative outputs, which is not acceptable, since the outputs are probabilities and thus the prediction should be also a probability. If I use tf.nn.softmax or tf.nn.softplus the model always returns 0 in the test set. Any suggestion?

score 1 · Answer 1 · answered Dec 01 '16 at 01:45

1

The easiest way is to just use the sigmoid activation as output, as this will squash any output range into the [0, 1] range. Then for training you can use either a mean square error or similar loss, or the binary cross entropy. In general the binary cross entropy might work better.

answered Dec 01 '16 at 01:45

Dr. Snoopy

55,122
7
121
140

Thanks Matias. That makes sense, but when I use sigmoid as an activation function for the last node, all the predictions become zero. Any idea? – boomz Dec 01 '16 at 17:00
@boomz Not from so little information, I guess there is something wrong in your network or training, I hope you are not initializing weights to zero :) – Dr. Snoopy Dec 01 '16 at 22:38

Training a TensorFlow model for regression when labels are probabilities

1 Answers1