when do we not need activation function?

Question

I wrote a very basic tensorflow model where I want to predict a number:

import tensorflow as tf
import numpy as np


def HW_numbers(x):
    y = (2 * x) + 1
    return y

x = np.array([1.0,2.0,3.0,4.0,5.0,6.0,7.0], dtype=float)
y = np.array(HW_numbers(x))

model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')
model.fit(x,y,epochs = 30)

print(model.predict([10.0]))

This above code works fine. But if I add an activation function in Dense layer, the prediction becomes weird. I have tried 'relu','sigmoid','tanh' etc.

My question is, why is that? What exactly is activation function doing in that single layer that messes up the prediction? I have used Tensorflow 2.0

score 2 · Answer 1 · answered Jun 29 '20 at 16:07

Your network consists of just one neuron. So what it does with with no activation function is to multiply your input with the neurons weight. This weight will eventually converge to something around 2.1.

But with relu as an activation function, only positive numbers are propagated through your network. So if your neuron's weight is initialized with a negative number, you will always get zero as an output. So with relu, you have a 50:50 chance to get good results.
With the activation functions tanh and sigmoid, the output of the neuron is limited to [-1,1] and [0, 1] respectively, so your output can't be more than one.

So for such a small neuronal network, these activation functions don't match the problem.

score 2 · Accepted Answer · answered Jun 29 '20 at 16:07

Currently, you are learning a linear function. As it can be described by a single neuron, you just need a single neuron to learn the function. On the other hand activation function is:

to learn and make sense of something really complicated and Non-linear complex functional mappings between the inputs and response variable. It introduces non-linear properties to our Network. Their main purpose is to convert an input signal of a node in an A-NN to an output signal. That output signal now is used as an input in the next layer in the stack.

Hence, as you have just a single neuron here (a specific case), you do not need to pass the value to the next layer. In other words, all hidden, input, and output layers are merged together. Hence, the activation function is not helpful for your case. Unless you want to make a decision base on the output of the neuron.

when do we not need activation function?

2 Answers2