Why do both tf.nn.relu and tf.nn.sigmoid work the same in this custom estimator

Question

This is the guide to make a custom estimator in TensorFlow: https://www.tensorflow.org/guide/custom_estimators

The hidden layers are made using tf.nn.relu:

# Build the hidden layers, sized according to the 'hidden_units' param.
for units in params['hidden_units']:
    net = tf.layers.dense(net, units=units, activation=tf.nn.relu)

I altered the example a bit to learn XOR, with hidden_units=[4] and n_classes=2. When the activation function is changed to tf.nn.sigmoid, the example works as usual. Why is it so? Is it still giving correct result because XOR inputs are just zeros and ones?

Both functions give smooth loss curves converge to zero line.

yaho cho · Accepted Answer · 2019-05-11T10:37:50.100

1

About XOR problem, relu solved a vanishing gradient that an error value by back propagation is vanished in deep hidden layers.

So, Sigmoid works if you make just one hidden layer.

Sigmoid has a vlue in 0~1. An Error value by back propagation from output layer is going to be very small value at the far from output layer by a partial differential equation.

Blue line is Relu and Yellow line is Sigmoid.

Relu has x value if it is over than 0. So, Error value can be reached to 1st layer.

edited May 11 '19 at 10:37

answered May 11 '19 at 10:24

yaho cho

1,779
1
7
19

tks, is relu a must if using more than 1 hidden layer? – Dee May 11 '19 at 10:29
1

@datdinhquoc yes. not just 1. But, Relu is ok for 1 hidden layer too. So, I recommend relu for hidden layer. There are various activation functions. Basically, It's depends on your model. In example, For the binary classification of NN, You need to use sigmoid for output layer. – yaho cho May 11 '19 at 10:42

Why do both tf.nn.relu and tf.nn.sigmoid work the same in this custom estimator

1 Answers1