4

I am dipping my toe into neural networks and starting with some basic perceptrons. In one video, this guy is explaining how to make a machine that can 'learn' how to distinguish two arrays. He explains the training process, but just shoves all of his inputs and weights into the sigmoid function. I did some research on the sigmoid function and was wondering why it is used in machine learning and why programmers use it to test their inputs.

Darrow Hartman
  • 4,142
  • 2
  • 17
  • 36
  • 1
    I don't think this fits on SO. You can try [Data Science](https://datascience.stackexchange.com) or [AI](https://ai.stackexchange.com). – gmds May 27 '19 at 01:25
  • 2
    sigmoid normalizes the value what can be very big to the bounds for example 0-1 while loosing the difference in too big values but keeping it for small. Use it if you need to normalize someth but you dont have max value – user8426627 May 27 '19 at 01:33
  • Did you check 3Blue1Brown's channel on youtube? – mikuszefski May 27 '19 at 08:42

3 Answers3

4

This function's job is to make the numbers between 0 and 1, usually for supervised classification problems. for example in binary supervised classification problems that the labels are only two (for example in the picture below), then one data that is far from others will effect too much on the separator line.

But when we use Sigmoid function we can see that a data far from others won't effect the separator too much.

Also this function can show you a probability as well. for example if you have a new data to predict, then you can use the line and see how much it is possible that the data belongs to some label. (Take a look at the picture to understand better)

picture Link : https://pasteboard.co/IgLjcYN.jpg

NOTE : labels -> y  and feature -> x

1

Sigmoid is a non-linear activation function widely used in Logistic Regression and Artificial Neural Networks. If, we look at its Python implementation,

import math

def sigmoid( x ):
    return 1 / ( 1 + math.exp( -x )) 

If the inputs are negative then the outputs will be smaller than 0.5. If the inputs are positive then the outputs are greater than 0.5.

Sigmoid Curve

Uses in Machine Learning:

In machine learning, if we tend to learn a relationship between some features and a binary feature then we use a sigmoid function at the output layer ( which produces the outputs ). As the output ranges between 0 and 1, we can set a decision boundary and determine whether the label was 0 or 1.

Also, they were used in hidden layers of Artificial Neural Networks. Sigmoid produces an activation based on its inputs ( from the previous layer ) which is then multiplied by the weights of the succeeding layer to produce further activations. If a greater positive value is intercepted by Sigmoid, it gives a fully saturated firing of 1. In the case of small negative value, a firing of 0 is produced. Hence, it produces an activation value based on a threshold.

Also, since the output is between 0 and 1, it's output could be interpreted as a probability for a particular class.

Some Particular problems with Sigmoid ( And its replacement with ReLU ):

Sigmoid suffers from the problem of Vanishing Gradient. The gradients of the NN's output with respect to the parameters become so small, that the NN takes smaller steps towards the minima of the loss function and eventually stop learning.

Also, extremely greater or smaller values are mapped to extremities i.e 0 or 1 which causes no change in the model's output w.r.t the parameters like the weights and biases.

This problem was tackled by the use of ReLU which does not squash the inputs ( like the sigmoid ) and hence the Vanishing Gradient problem was solved.

Shubham Panchal
  • 4,061
  • 2
  • 11
  • 36
0

Sigmoid is one of the possible activation functions. The purpose of an activation function is to squeeze all possible values of whatever magnitude into the same range.

Here's a good article - https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6

Maxim Volgin
  • 3,957
  • 1
  • 23
  • 38