Regularized Logistic Regression Cost Function Log(1-p) = inf

Question

I am trying to implement logistic regression cost function.
I tested my implementation and it works fine for different datasets.
But when I was trying to solve by new dataset, I realized that the second part of below equation (term2) always is inf. The problem is the value passed to the np.log() function is 0, so it gives me inf. Actually, the answer of sigmoid(hypothesis(x,theta)) = 1.

term1 = -y*(np.log(sigmoid(hypothesis(x,theta))))
term2 = ((1-y)*(np.log(1 - sigmoid(hypothesis(x,theta)))))
infunc1 = term1 - term2 
infunc2 = (lambda_*np.sum(theta[1:]**2))/(2*m)
j = (np.sum(infunc1)/m)+infunc2

The first solution I think is to add a very small value to 0 to prevent from inf. But I do not know this is correct or not. (based on this question).

What should I when the answer of multiplying some features to weights is zero and pass this answer to log?

Thanks for any advice. Happy coding

Your intuition is correct, adding an epsilon solves the issue. — Imanol Luengo, Oct 14 '18 at 15:37
Thank you. Should I add this small value to the all of the elements of matrix or just the elements cause `inf` answer? What happen if I do the other one? — M. Doosti Lakhani, Oct 14 '18 at 15:39
You could add epsilon to all the values, or use numpy masked opetations (to avoid nans and infs), e.g. `np.ma.log(...)`. — Imanol Luengo, Oct 14 '18 at 16:06

score 0 · Answer 1 · answered Oct 20 '18 at 17:53

As friends told, you just need to add a very small value to log function:

epsilon = 1e-5
term1 = -y*(np.log(epsilon+sigmoid(hypothesis(x,theta))))
term2 = ((1-y)*(np.log(epsilon+1 - sigmoid(hypothesis(x,theta)))))
infunc1 = term1 - term2 
infunc2 = (lambda_*np.sum(theta[1:]**2))/(2*m)
j = (np.sum(infunc1)/m)+infunc2

Regularized Logistic Regression Cost Function Log(1-p) = inf

1 Answers1