Python Logistic Regression / Hessian. Getting a divide by zero error and a singular matrix error

Question

code:

def sigmoid(x):
    return 1.0/(1+np.exp(-x)) 

def cost(x,y,th):
    pro = sigmoid(np.dot(x,th))
    result = sum(-y * np.log(pro) - (1-y) * np.log(1-pro))   
    result = result/len(x) #len: number of feature rows
    return result

def gradient(x,y,th):
    xTrans = x.transpose()                                      
    sig = sigmoid(np.dot(x,th))                              
    grad = np.dot(xTrans, ( sig - y ))                          
    grad = grad / len(x) #len: number of feature rows  
    return grad
def hessian(x,y,th):
    xTrans = x.transpose()                                      
    sig = sigmoid(np.dot(x,th))                              
    result = (1.0/len(x) * np.dot(xTrans, x) * np.diag(sig) * np.diag(1 - sig) )   
    return result
def updateTh(x,y,th):
    hessianInv = np.linalg.inv(hessian(x,y,th))                         
    grad = gradient(x,y,th)                                  
    th = th - np.dot(hessianInv, grad)                     
    return th

m = 80 #number of x rows
x = np.ones([m,3])
y = np.empty([m,1], dtype = int)
th = np.zeros([3,1])
hessianResult = np.identity(3) #identity 3x3


with open('exam.csv','r') as csvfile:
            i = 0
            reader = csv.reader(csvfile)
            next(reader) #skip header            
            for line in reader:
                x[i][1] = line[0]
                x[i][2] = line[1]
                y[i][0] = line[2]
                i+=1

#m = x.shape[0] #number of x rows
for i in range(10):
    costResult = cost(x,y,th)
    hessianResult = hessian(x,y,th)
    grad = gradient(x,y,th)
    th = updateTh(x,y,th)

If I loop more than 28 times, I get a divide by 0 issue with the "sum" part of my cost function, and I also get an error saying a matrix cannot be inversed because it is singular. Not sure what is wrong, following exact algorithm given by my professor. The data set is a list of 80 student entries, with two exam scores per entry and a binary 1 or 0 for whether or not the student was admitted to a college.

I am guessing it has something to do with your .csv data file, because I made my own file with random grades data, and your script runs fine when used on it. Would be hard to say without knowing how the data in your file is organized, or what the ranges for the grades are, and what determines the 1 or 0 for the final column data. — Wattholm, Oct 26 '19 at 02:40

score 0 · Answer 1 · answered Oct 26 '19 at 02:07

0

First of all, you can remove i=0 and i+=1 form the lines where you read the file. Just replace for line in reader: with for i, line in enumerate(reader):

enumerate will make i start from 0 and increase at every new line.

However, I can't spot anything wrong in your code, so my guess is that there is some issue with the initialization of the variables.

In particular: you ask x to be a (m,3) shaped array but you initialize only two columns. Also, you take th to be a matrix of zeros at the beginning. Then the first thing you do it compute the cost, which means first of all compute np.dot(x,th). I am afraid that this is gonna be independent of the data your professor gave you, because th is full of zeros. This would also explain why you always get the error at the same iteration.

answered Oct 26 '19 at 02:07

GRquanti

527
8
23

I've seen people mostly initialize th/theta matrix to be zeros. If I start with ones, I immediately get a singular matrix error. Also I am not sure what you mean about the x matrix. It has 3 columns, and the first column is all 1's in the end for bias values. – Andrew Ray Oct 26 '19 at 02:09
ok i got why you don't fill the 0-column of `x`. you can try with a random initialization of `th` but it's not the solution to your problem. However, which is the step that gives you the inversion error? I can see you call the hessian function many times – GRquanti Oct 26 '19 at 02:16
You can ignore all function calls in that loop and just look at the updateTh call. That is what gives the error on the 29th loop, and really the other functions don't even need to be called in the loop. HOWEVER, the 29th loop is also the loop where the cost function will also give an error, which is a divide by zero error. Somehow the theta matrix I am getting is causing these issues, but I don't see what's wrong – Andrew Ray Oct 26 '19 at 02:19
the cost function gives the divide by zero error, yes. But there are no divisions in that function. I am quite sure that what happens is the following: you call updateTh, and in turn it calls the inversion of the hessian. THAT is when you get a division by zero, when you try to invert the hessian. Look at the line of the Th function where you get the division by zero. Isn't the first one, when you invert the hessian? I bet it is. However, this (obviously) means that the hessian is not invertible, and the reasons can be two: your prof is fooling you (doubt it) OR there is a mistake somehwere – GRquanti Oct 26 '19 at 02:24
The divide by zero error absolutely never shows up unless I use the cost function and I isolated it specifically to the "result = sum(-y * np.log(pro) - (1-y) * np.log(1-pro)) " portion https://i.imgur.com/ZdUqXg9.png the inverse error is on the call to the updateTh function which tries to do an inverse – Andrew Ray Oct 26 '19 at 02:28
This is weird because you ALSO get an inversion error. However, if you are sure you isolated the error, it must be due to a zero in the log. Print the value of `pro` – GRquanti Oct 26 '19 at 02:32
Here is the weird thing. I used an online calculator and got very similar results when I only loop 10 times: https://i.imgur.com/TOnkUdC.png and subsequent loops after 10 are mostly the same, then suddenly starts changing. I have no idea why lol – Andrew Ray Oct 26 '19 at 02:33
What about the value of `pro` in the cost function? If you are sure the error is there then it must be 0 or 1 – GRquanti Oct 26 '19 at 02:38
Now I need to figure out how to plot the curve correctly, it's not working lol – Andrew Ray Oct 26 '19 at 02:41

score 0 · Answer 2 · answered Oct 26 '19 at 02:54

Used this as sample data for the .csv (80 entries, not including the header):

Grade 1,Grade 2,Admit
83,95,1
87,93,1
92,91,1
94,88,0
81,97,0
88.3,92.5,1
88.6,92.4,0
88.9,92.3,0
89.2,92.2,0
89.5,92.1,0
89.8,92,1
90.1,91.9,1
90.4,91.8,1
90.7,91.7,1
91,91.6,0
91.3,91.5,0
91.6,91.4,1
91.9,91.3,0
92.2,91.2,0
92.5,91.1,0
92.8,91,0
93.1,90.9,1
93.4,90.8,1
93.7,90.7,1
94,90.6,0
94.3,90.5,0
94.6,90.4,1
94.9,90.3,0
95.2,90.2,0
95.5,90.1,0
95.8,90,0
96.1,89.9,1
96.4,89.8,1
96.7,89.7,0
97,89.6,0
97.3,89.5,0
97.6,89.4,1
97.9,89.3,1
98.2,89.2,0
98.5,89.1,0
98.8,89,0
99.1,88.9,1
99.4,88.8,1
99.7,88.7,0
100,88.6,0
100.3,88.5,1
100.6,88.4,1
100.9,88.3,0
101.2,88.2,0
101.5,88.1,0
101.8,88,1
102.1,87.9,1
102.4,87.8,0
102.7,87.7,0
103,87.6,0
103.3,87.5,1
103.6,87.4,1
103.9,87.3,0
104.2,87.2,0
104.5,87.1,0
104.8,87,1
105.1,86.9,1
105.4,86.8,0
105.7,86.7,0
106,86.6,1
106.3,86.5,1
106.6,86.4,0
106.9,86.3,0
107.2,86.2,0
107.5,86.1,1
107.8,86,1
108.1,85.9,0
108.4,85.8,0
108.7,85.7,0
109,85.6,1
109.3,85.5,1
109.6,85.4,0
109.9,85.3,0
110.2,85.2,0
110.5,85.1,1

Edited your script only to add the required imports, and to print some output:

import numpy as np
import csv

def sigmoid(x):
    return 1.0/(1+np.exp(-x)) 

def cost(x,y,th):
    pro = sigmoid(np.dot(x,th))
    result = sum(-y * np.log(pro) - (1-y) * np.log(1-pro))   
    result = result/len(x) #len: number of feature rows
    return result

def gradient(x,y,th):
    xTrans = x.transpose()                                      
    sig = sigmoid(np.dot(x,th))                              
    grad = np.dot(xTrans, ( sig - y ))                          
    grad = grad / len(x) #len: number of feature rows  
    return grad
def hessian(x,y,th):
    xTrans = x.transpose()                                      
    sig = sigmoid(np.dot(x,th))                              
    result = (1.0/len(x) * np.dot(xTrans, x) * np.diag(sig) * np.diag(1 - sig) )   
    return result
def updateTh(x,y,th):
    hessianInv = np.linalg.inv(hessian(x,y,th))                         
    grad = gradient(x,y,th)                                  
    th = th - np.dot(hessianInv, grad)                     
    return th

m = 80 #number of x rows
x = np.ones([m,3])
y = np.empty([m,1], dtype = int)
th = np.zeros([3,1])
hessianResult = np.identity(3) #identity 3x3


with open('exam.csv','r') as csvfile:
            i = 0
            reader = csv.reader(csvfile)
            next(reader) #skip header            
            for line in reader:
                x[i][1] = line[0]
                x[i][2] = line[1]
                y[i][0] = line[2]
                i+=1

#m = x.shape[0] #number of x rows
for i in range(40):
    print("Entry #",i,": ",x[i][1],", ",x[i][2],", ",y[i][0], sep = '')
    costResult = cost(x,y,th)
    print(costResult)
    hessianResult = hessian(x,y,th)
    print(hessianResult)
    grad = gradient(x,y,th)
    print(grad)
    th = updateTh(x,y,th)
    print(th)

It should run and output fine on the command line (Windows), if you just copy my csv data, and use that as test input. So the only difference I can see between what I did and what you did -- which may be the cause of the errors -- is in the structuring of the grades data file itself.

are you suggesting the professor fooled them? lol. However I agree that there seems to be no mistake in the code — GRquanti, Oct 26 '19 at 03:23
Thanks, but it's weird I can completely fill the matrix just fine. I don't think it should be an issue with the input file, because it seems perfectly structured when I test displaying my matrices — Andrew Ray, Oct 26 '19 at 03:35
@GRquanti No I am definitely not suggesting that, haha. I just want to see the csv file because it seems to work with the sample I made. And I do find it strange that the script works flawlessly on my end. It seems I have hit a proverbial wall in my testing because I cannot reproduce the OP's errors. The only thing I can think of to do is ask for the input file so that I may be able to get the same erroneous results, and continue trying to resolve the issue. — Wattholm, Oct 27 '19 at 00:32

Python Logistic Regression / Hessian. Getting a divide by zero error and a singular matrix error

2 Answers2