Difference in having Sigmoid activation function instead of linear activation and using sigmoid in loss

Question

I am fairly new to the loss-functions and I have a 800 binary classification problem (meaning 800 neurons at the output that are not effected by eachother - probablity of each is 0 or 1). Now looking at the Documentations from: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

It seems that it uses "logits" which are the outputs of the network with a linear activation function and the Sigmoid (needed for the binary classification) is applied in the loss-function.

I am looking at the loss-function for the soft-max activation and similar approach is applied. I am wondering why the activation function is not added to the network outputs and the loss function receives the linear outputs (logits) and in the loss function activation is applied.

No big reason. The sigmoid is used in the loss 1) to save you one step elsewhere 2) to make sure every input to the loss is normalized thus between (0,1). — greeness, Jun 17 '19 at 23:24
@greeness The application of the linear logits made problems in the loss function that are discussed in the help file mentioned in the question that could be prevented. I need to take the Sigmoid when deploying the model so I believe it does not save me anything. 2) it is not normalized, the inputs to the loss-function can be negative as the same help file specifies. — Yahya Nik, Jun 17 '19 at 23:29
if you don't need that convenience (actually a pain for you), simply use other pre-defined loss (`tf.losses.log_loss `) or make one for your self. :) — greeness, Jun 17 '19 at 23:34

score 1 · Accepted Answer · answered Jun 18 '19 at 00:08

No big reason. The sigmoid is used in the loss

to save you one step elsewhere
to make sure every input to the loss is normalized thus between (0,1).

if you don't need that convenience (actually a pain for you), simply use other pre-defined loss (tf.losses.log_loss) or make one for your self. :)

score 1 · Answer 2 · answered Jun 18 '19 at 08:14

Naive application of sigmoid/softmax and cross-entropy is numerically unstable. This is due to the exp in the sigmoid and the log in the softmax. You can run into issues with over/underflow which could result in things like log(0) being taken (which would result in -inf). To avoid this, the cross-entropy functions use a "smarter" formulation based directly on the logits, exploiting the fact that log(exp(x)) == x. You should always use these functions to avoid numerical problems. If you need the actual probabilities elsewhere, you can still just apply sigmoid/softmax at those points.

I had the problem of log(0) and that is why I looked into the tensorflow implementation of these functions. The tf.losses.log_loss mentioned by @greeness has a epsilon to remedy the issues too. It seems they implemented both. — Yahya Nik, Jun 18 '19 at 16:23

score 0 · Answer 3 · answered Jun 18 '19 at 08:26

0

The very simple explanation is it's usage in output: sigmoid is used basically for binary classification by treating values from 0 to 1 as probability of primary class, and linear is used for regression problems.

answered Jun 18 '19 at 08:26

DmytroSytro

579
4
15

Difference in having Sigmoid activation function instead of linear activation and using sigmoid in loss

3 Answers3