Why is ReLU used in regression with Neural Networks?

Question

I am following the official TensorFlow with Keras tutorial and I got stuck here: Predict house prices: regression - Create the model

Why is an activation function used for a task where a continuous value is predicted?

The code is:

def build_model():
    model = keras.Sequential([
        keras.layers.Dense(64, activation=tf.nn.relu, 
                   input_shape=(train_data.shape[1],)),
        keras.layers.Dense(64, activation=tf.nn.relu),
        keras.layers.Dense(1)
    ])

    optimizer = tf.train.RMSPropOptimizer(0.001)

    model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
    return model

score 8 · Accepted Answer · edited Jun 20 '20 at 09:12

The general reason for using non-linear activation functions in hidden layers is that, without them, no matter how many layers or how many units per layer, the network would behave just like a simple linear unit. This is nicely explained in this short video by Andrew Ng: Why do you need non-linear activation functions?

In your case, looking more closely, you'll see that the activation function of your final layer is not the relu as in your hidden layers, but the linear one (which is the default activation when you don't specify anything, like here):

keras.layers.Dense(1)

From the Keras docs:

Dense

[...]

Arguments

[...]

activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).

which is indeed what is expected for a regression network with a single continuous output.

Thank you for taking your time to answer my question, that was a great video! — Popovici Andrei-Sorin, Jul 20 '18 at 12:47

Why is ReLU used in regression with Neural Networks?

1 Answers1