Sigmoid function and derivative of sigmoid function in ANN

Question

I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:

sigmoid(x) = tanh(x)

dsigmoid(x) = 1-x*x

However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):

dsigmoid(x) = sech(x)*sech(x)

When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.

The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech²(x) doesn't (model obtained after max number of iterations holds wrong weights)?

score 2 · Accepted Answer · answered May 10 '17 at 07:07

2

In the first set of formulas, the derivative is expressed as function of the function value, that is

tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f))

As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.

answered May 10 '17 at 07:07

Lutz Lehmann

25,219
2
22
51

yeah, y=sigmoid(x), dsigmoid should be applied on x too for getting gradient. d=dsigmoid(x). So in the tanh case, the formula is that way – Dee Jul 31 '17 at 08:37

Sigmoid function and derivative of sigmoid function in ANN

1 Answers1