1

Im doing a Neural network task with Sigmoid activation function. My networks input is image(MNIST dataset) and because dimension of each image is 28*28, When I have to convert them to a vector, i will have N*784 matrix. Multiplication of this large matrix with weight matrix produce large positive and negative numbers for weights and i have to pass them to a Sigmoid function. I use expit() as sigmoid function and my problem is that:

numbers until 30 resulting near 1 in expit(). for example expit(28) results 0.99999999 and expit(29) results 1.0 and upper 29 also gets 1. But my new wights are upper 30 and because of that some of them gets 1 and some 0 in the first cycle of learning and in fact there is no any learning at all.

What i have to do? Sigmoid's upper bound is 29? i cant change it? I have to change my image dimension to overcome this?

Fcoder
  • 9,066
  • 17
  • 63
  • 100
  • Which version on scipy you are using? expit was numerically unstable until 0.14 (https://github.com/scipy/scipy/issues/3385) – Lukasz Tracewski Nov 26 '16 at 15:29
  • i dont know, two days ago i installed it via pip. – Fcoder Nov 26 '16 at 15:30
  • check with python -c 'import scipy; print(scipy.__version__)' I assume you are using scipy's expit, right? – Lukasz Tracewski Nov 26 '16 at 15:32
  • expit version is: 0.18.1 . My main problem is, for example when 1000*1000 numpy matrix multiplied, result may large numbers and those large numbers doesn't useful for sigmoid function. i don't know what i have to do. right now i divide result to 1000 and use them. but i don't know how reasonable and valid is. it seems nobody faced with this problem on the web! – Fcoder Nov 26 '16 at 15:36
  • Your question is on numerical stability, which is what I am trying to answer. But you're right, sigmoid is not very useful in such cases. That's why people are not using it and hence you have problems with finding your solution on the web :). How about using cross-entropy instead? With sigmoid you will inevitably run into '1' in any finite-precision calculations. – Lukasz Tracewski Nov 26 '16 at 15:43
  • @LukaszTracewski: thanks, i'm just learning and i read about sigmoid's right now. i faced this problem and i find it worse for this application. This was my first test. thank you for helping me. if you write a explanation about this, i can accept it as correct answer – Fcoder Nov 26 '16 at 15:47

1 Answers1

2

As discussed in comments section, the real problem turned out to be using sigmoid itself, which is not suited for such cases. In any finite-precision calculations one will face the described problem, one system with 29, on other with 38.

One way to tackle the problem would be use softmax activation function, which is less susceptible to such issues. Mind that with cost function you might encounter similar challenges.

Slightly off-topic, ut you might want to check how the problem is resolved with e.g. tensorflow. It has some nice tutorials for beginners.

Lukasz Tracewski
  • 10,794
  • 3
  • 34
  • 53