0

I am new to machine learning and neural network. I am trying to do text classification using neural network from scratch. In my dataset there are 7500 documents each labeled with one of seven classes. There are about 5800 unique words. I am using one hidden layer with 4000 neurons. Using sigmoid for activation function. Learning rate=0.1,No dropout.

During training After about 2 to 3 epochs,A warning is displayed :

RuntimeWarning: overflow encountered in exp.The resultant output list appears as:

[  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   4.11701866e-10]  for every input except 4.11701866e-10.

sigmoid function:

def sigmoid(x):    
   output = 1/(1+np.exp(-x))
   return output

def sigmoid_output_to_derivative(output):
   return output*(1-output)

How to fix this?Can i use different activation function?

Here is my full code: https://gist.github.com/coding37/a5705142fe1943b93a8cef4988b3ba5f

Umair
  • 3
  • 2

2 Answers2

0

It is not that easy to give a precise answer, since the problems can be manifold and is very hard to reconstruct, but I'll give it a try:

So it seems you're experiencing underflow, meaning that the weights of your neurons scale your input vector x to values that will lead to zero values in the sigmoid function. A naive suggestion would be to increase the precision from float32 to float64, but I guess you are already at that precision.

Have you played around with the learning rate and/or tried an adaptive learning rate? (See https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 for some examples). Try more iterations with a lower learning rate for a start maybe.

Also: Are you using sigmoid functions in your output-layer? The added non-linearity could drive your neurons into saturation, i.e. your problem.

Have you checked your gradients? This can also sometimes help in tracking down errors (http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization).

Alternatively, you could try if your training improves by using other activation functions, e.g. linear for a start.

thogra
  • 315
  • 1
  • 10
  • can you post the full code here, it will help to understand the problem – Taimur Islam Feb 15 '18 at 11:27
  • If i increase or decrease the number of neurons in hidden layer,how will it effect the error rate?? – Umair Feb 15 '18 at 13:54
  • Hard to say... if you're using no regularization more neurons make the network prone to overfitting (https://en.wikipedia.org/wiki/Overfitting). Decreasing the number of neurons may make the network not expressive enough for the task at hand. – thogra Feb 15 '18 at 14:37
0

Since probabilities in machine learning tend to be very small and computations on them lead to even smaller values (leading to underflow errors), it is good practice to do your computations with logarithmic values.

Using float64 types isn't bad but will also fail eventually.

So instead of i.e. multiplying two small probabilities you should add their logarithmic values. Same goes for other operations like exp().

Every machine learning framework I know either returns logarithmic model params by default or has a method for that. Or you just use your built in math functions.

C. Doe
  • 1,180
  • 8
  • 12