I get confused with the activation functions. why we widely use the Relu function also at the end it's mapping will be a line? Using the sigmoid and tanh make the decision boundary to be squiggle which will fit the data well but, relu map a line( aW+b) to a line also? how this will fit the data better?
Asked
Active
Viewed 7 times