When using neural networks for classification, it is said that:
- You generally want to use softmax cross-entropy output, as this gives you the probability of each of the possible options.
- In the common case where there are only two options, you want to use sigmoid, which is the same thing except avoids redundantly outputting p and 1-p.
The way to calculate softmax cross entropy in TensorFlow seems to be along the lines of:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y))
So the output can be connected directly to the minimization code, which is good.
The code I have for sigmoid output, likewise based on various tutorials and examples, is along the lines of:
p = tf.sigmoid(tf.squeeze(...))
cost = tf.reduce_mean((p - y)**2)
I would have thought the two should be similar in form since they are doing the same jobs in almost the same way, but the above code fragments look almost completely different. Furthermore, the sigmoid version is explicitly squaring the error whereas the softmax isn't. (Is the squaring happening somewhere in the implementation of softmax, or is something else going on?)
Is one of the above simply incorrect, or is there a reason why they need to be completely different?