9

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.

This is my code.

import numpy as np
from scipy.special import expit


def cost(X,y,theta,regTerm):
    (m,n) = X.shape
    J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
    return J

def gradient(X,y,theta,regTerm):
    (m,n) = X.shape
    grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
    return np.asarray(grad)

def train(X,y,regTerm,learnRate,epsilon,k):
    (m,n) = X.shape
    theta = np.zeros((k,n))
    for i in range(0,k):
        previousCost = 0;
        currentCost = cost(X,y,theta[i,:],regTerm)
        while(np.abs(currentCost-previousCost) > epsilon):
            print(theta[i,:])
            theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
            print(theta[i,:])
            previousCost = currentCost
            currentCost = cost(X,y,theta[i,:],regTerm)
    return theta

trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)

I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error

RuntimeWarning: divide by zero encountered in log
  J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])

why is this happening and how can i avoid this?

Oria Gruber
  • 1,513
  • 2
  • 22
  • 44

6 Answers6

18

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was

epsilon = 1e-5    

def cost(X, y, theta):
    m = X.shape[0]
    yp = expit(X @ theta)
    cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
    return cost
7

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator @ for matrix multiplication — and breaking it up as suggested in the comments.

Here is your cost function:

def cost(X, y, theta, regTerm):
    m = X.shape[0]  # or y.shape, or even p.shape after the next line, number of training set
    p = expit(X @ theta)
    log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
    J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
    return J

You can clean up your gradient function along the same lines.

By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
Alicia Garcia-Raboso
  • 13,193
  • 1
  • 43
  • 48
3

Cause:

This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.

Cost function:

J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])

Now, consider the following part in the above code snippet:

np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))

Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.

2

I'm guessing your data has negative values in it. You can't log a negative.

import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan

There are a lot of different ways to transform your data that should help, if this is the case.

Jeff
  • 2,158
  • 1
  • 16
  • 29
  • No. I just checked. No negative values in the data anywhere. – Oria Gruber Jun 30 '16 at 14:05
  • In that case, you're doing both subtraction and division inside that enormous line that the error comes up on, so that's where I would look next. Also it would make it much more comprehensible to split that line apart into steps I think. It would also make it easier to debug. – Jeff Jun 30 '16 at 14:08
  • 1
    For example, calculate the denominator separately and assign it to a variable, then divide on that line by the variable. But before that put some intermediate output that prints the denominator for you, so you can see if it is indeed zero. If it is you can split it apart more to track down how it's getting there. – Jeff Jun 30 '16 at 14:13
  • thats the thing. the only fraction i have is the /m bit. the denominator is not zero there. inside the log i dont divide by anything. – Oria Gruber Jun 30 '16 at 14:16
  • Split it apart and look. Obviously something is doing what isn't expected. It's good practice to use intermediate output on this sort of thing anyway, to make sure every step is doing what you expect. – Jeff Jun 30 '16 at 14:21
  • @OriaGruber: Until you split up the code and/or print the values involved, you lack sufficient evidence to claim that the values are valid. Do the basic debugging; make it a habit. You're going to need the skills as you progress in programming. – Prune Jun 30 '16 at 17:35
1
def cost(X, y, theta):
    yp = expit(X @ theta)
    cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
    return cost

The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)

entithat
  • 324
  • 6
  • 18
samisaf
  • 11
  • 1
0

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all. But i am not sure if it will give accurate results or not .