I know how the step transfer function works but how does the linear transfer function work? What equation do you use? Relate answer to AND gate with two inputs and a bias
2 Answers
First of all, in general you want to apply linear transfer function only in the output layer of an MLP and "never" in the hidden layers, where non-linear transfer functions are typically used (logistic function, step. etc.).
Linear transfer function (in the form of f(x) = x for pure linear or purelin as it is mentioned in literature) is typically used for function approximation / regression tasks (this is intuitive because step and logistic functions give binary results where the linear function gives continuous results).
Non- linear transfer functions are used for classification tasks.

- 462
- 1
- 9
- 24
Non-linear transfer function(aka: activation function) is the most important factor which assigns the nonlinear approximation capability to the simple fully connected multilayer neural network.
Nevertheless, 'linear' activation function, of course, is one of the many alternatives you might want to adopt. But the problem is, pure linear transfer(f(x) = x) in hidden layers doesn't make sense for us, which means it may be 'in vain' if we try to train a network whose hidden units are activated by pure linear function.
We may understand this process with the following:
Assuming f(x)=x is our activation function, and we try to train a single hidden layer network having 2 input units(x1,x2), 3 hidden units(a1,a2,a3) and 1 output unit(y). Hence, the network tries to approximate the function :
# hidden units
a1 = f(w11*x1+w12*x2+b1) = w11*x1+w12*x2+b1
a2 = f(w21*x1+w22*x2+b2) = w21*x1+w22*x2+b2
a3 = f(w31*x1+w32*x2+b3) = w31*x1+w32*x2+b3
# output unit
y = c1*a1+c2*a2+c3*a3+b4
if we combine all these equations, it turns out:
y = c1(w11*x1+w12*x2+b1) + c2(w21*x1+w22*x2+b2) + c3(w31*x1+w32*x2+b3) + b4
= (c1*w11+c2*w21+c3*w31)*x1 + (c1*w12+c2*w22+c3*w32)*x2 + (c1*b1+c2*b2+c3*b3+b4)
= A1*x1+A2*x2+C
As shown above, linear activation degenerate the network into a single input-output linear product, regardless of the structure of the network. What was done during the training process is factorizing A1, A2 and C into various factors.
Even one very popular quasi-linear activation function call Relu in deep neural network is also rectified. In other words, no pure linear activation in hidden layers is used unless you want to factorize coefficients.

- 21
- 5