why linear function is useless in multiple layer neural network? How last layer become the linear function of the input of first layer?

Question

I was studying about activation function in NN but could not understand this part properly - "Each layer is activated by a linear function. That activation in turn goes into the next level as input and the second layer calculates weighted sum on that input and it in turn, fires based on another linear activation function.

No matter how many layers we have, if all are linear in nature, the final activation function of last layer is nothing but just a linear function of the input of first layer! "

score 1 · Accepted Answer · edited Jan 07 '19 at 05:31

This is one of the most interesting concepts that I came across while learning neural networks. Here is how I understood it:

The input Z to one layer can be written as a product of a weight matrix and a vector of the output of nodes in the previous layer. Thus Z_l = W_l * A_l-1 where Z_l is the input to the Lth layer. Now A_l = F(Z_l) where F is the activation function of the layer L. If the activation function is linear then A_l will be simply a factor K of Z_l. Hence, we can write Z_l somewhat as: Z_l = W_l*W_l-1*W_l-2*...*X where X is the input. So you see the output Y will finally be the multiplication of a few matrices times the input vector for a particular data instance. We can always find a resultant multiplication of the weight matrices. Thus, output Y will be W_Transpose * X. This equation is nothing but a linear equation that we come across in linear regression. Therefore, if all the input layers have linear activation, the output will only be a linear combination of the input and can be written using a simple linear equation.

So, why is it a bad thing? Like, why would we not want the output to be only a linear combination of the input? — Jaden Lorenc, Sep 09 '22 at 03:26

score 0 · Answer 2 · answered Jan 06 '19 at 19:49

It isn't really useless.

If there are multiple linearly activated layers, the results of the calculations in the previous layer would be sent to the next layer as input. Same thing happens in the next layer. It would calculate the input and send it based on another linear activation function to the next layer.

If all layers are linear it doesn't matter how much layers there actually are. The last activation function of final layer will also be a linear function of the input from the first layer.

If you want a good read about Activation Functions you can find one here and here.

why linear function is useless in multiple layer neural network? How last layer become the linear function of the input of first layer?

2 Answers2