I'm making ANN from a tutorial. In the tutorial, the sigmoid and dsigmoid are as following:
sigmoid(x) = tanh(x)
dsigmoid(x) = 1-x*x
However, by definition, dsignmoid is derivative of sigmoid function, thus it should be (http://www.derivative-calculator.net/#expr=tanh%28x%29):
dsigmoid(x) = sech(x)*sech(x)
When using 1-x*x, the training does converge, but when I use the mathematically correct derivate, ie. sech squared, the training process doesn't converge.
The question is why 1-x*x works (model trained to correct weights), and the mathematical derivative sech2(x) doesn't (model obtained after max number of iterations holds wrong weights)?