How an Activation Function Works?

Question

I'm still a little confused on how activation functions work in Neural Networks (well, confused to not be able to explain them in layman's terms). So far I have:

The activation function in a hidden layer determines whether the neuron is switched ON (passes a value to the next layer) or switched OFF (nothing is passed to the next layer). This is accomplished by feeding the result from a weights/bias calculation into a function (e.g. sigmoid) that results in an output that is high (ON/value passed) or low (OFF/value not passed).

What's confusing me:

What happens next? If the neuron is ON, 1) do the connectors between the neuron and next layer have their own values of w & b that are passed to the next layer? 2) Is the input to the activation function passed to the next layer? or, 3) is the output from the sigmoid function passed to the next layer? I think the answer is (1).

Can anyone help me with this areas of confusion?

Please notice that SO is about *specific coding* questions; non-coding questions about machine learning theory & methodology are off-topic here, and should be posted at [Cross Validated](https://stats.stackexchange.com/help/on-topic) instead. See the **NOTE** in the `machine-learning` [tag info](https://stackoverflow.com/tags/machine-learning/info). — desertnaut, Sep 25 '20 at 12:05
I’m voting to close this question because it is not about programming as defined in the [help] but about ML theory. — desertnaut, Sep 25 '20 at 12:06
I’m voting to close this question because it is not about programming. You have a conceptual question about neural networks, that is best asked in a setting for mathematics, statistics or machine learning. — gspr, Sep 25 '20 at 12:06

Asimonu · Accepted Answer · 2020-09-25T10:42:39.993

First of all, I think you should forget the idea of "ON" or "OFF" because it is not really the way it often works : it is not compulsory that the result of a such function is something binary. There exist threshold activation functions, but they are not the only ones. The sigmoid function is a function that goes from the reals to the set ]0,1[. This function is applied and, unless you add a threshold, your neuron always outputs something, even if it is tiny or big, that is neither 0 nor 1. Take the example of the linear activation function : you can even output among all reals. Then, the idea of on/off isn't relevant.

The goal of a such function is to add complexity to the model, and to make it non-linear. If you had a neural network without these functions, the output would just be a linear weighted sum of the inputs plus bias which is often not complex enough to solve problems (the example of simulating a XOR gate with a network is often used, you won't do it without activation functions). With activation functions, you can add whatever you want like tanh, sigmoid, ReLU...

That being said, the answer is 1 and 3.

If you take a random neuron n in a hidden layer, its input is a sum of values weighted by weights, and a bias (also weighted by a weight often called w0), sum on which it then applies the activation function. Imagine the weighted values of the previous neurons are 0.5 and 0.2, and you have a weighted bias of 0.1. You then apply a function, let's take the sigmoid, on 0.5+0.2+0.1=0.8. That makes something like 0.69.
The output of the neuron is the result of the function. Each neuron of the next layer will make a weighted sum of the output of the current layer, including the output of our neuron. Note that each neuron of the next layer has its own weights between the previous layer and itself. Then, neurons of the next layer will apply an activation function (not necesarly the same as current layer) to make their own outputs. So, informaly, it will do something like activ_func(..+..+0.69*weight_n+..).

That means, you can say each layer takes as value the result of the activation function applied on the weighted sum of the values of the neurons of the previous layer and a weighted bias. If you managed to read it without suffocating, you can recursively apply this definition for each layer (except input of course).

How an Activation Function Works?

1 Answers1