1

I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:

sigmoid(x) = tanh(x)

dsigmoid(x) = 1-x*x

However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):

dsigmoid(x) = sech(x)*sech(x)

When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.

The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech2(x) doesn't (model obtained after max number of iterations holds wrong weights)?

Dee
  • 7,455
  • 6
  • 36
  • 70

1 Answers1

2

In the first set of formulas, the derivative is expressed as function of the function value, that is

tanh'(x) = 1-tanh(x)^2 = dsigmoid(sigmoid(f))

As that is probably used and implemented in the existing code that way, you will get the wrong derivative if you replace that with the "right" formula.

Lutz Lehmann
  • 25,219
  • 2
  • 22
  • 51
  • yeah, y=sigmoid(x), dsigmoid should be applied on x too for getting gradient. d=dsigmoid(x). So in the tanh case, the formula is that way – Dee Jul 31 '17 at 08:37