1

I am implementing Andrew ng's machine learning course in python. In programming exercise 2, on the first question, I am getting write answers for cost function and gradient but when calculation optimised theta, I am getting a disastrous answer!

I have already tried my best but not able to find the error

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def cost_compute(  theta,x, y):
    J = (-1/m) * np.sum(np.multiply(Y, np.log(sigmoid(X @ theta))) 
        + np.multiply((1-Y), np.log(1 - sigmoid(X @ theta))))
    return J

[m, n] = X.shape
X = np.hstack( (np.ones((m,1)) , X) )
Y = Y[:, np.newaxis]
theta = np.zeros((n+1,1))

def grad( theta, X, Y):
    temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)  
    return temp

temp = opt.fmin_tnc(func = cost_compute, x0 = theta.flatten() , fprime = grad , args = (X, Y.flatten()))

print(temp)

the expected cost is 0.693 and I am getting it. The expected grad is also exactly the same as actual answer. But the optimized theta i am getting is array([4.42735730e-05, 5.31690927e-03, 4.98646266e-03], giving me the new cost around 60! (instead of 0.203)

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Dan Kyuso
  • 21
  • 3

2 Answers2

1

I did some tests by changing shapes of arrays, flattening it, reshaping it but nothing worked.

Since we were inputting a one dimension theta in fmin_tnc by flattening the theta, so i thought of changing the gradient function assuming that it will receive a single dimension theta instead of 3*1.

Earlier, it was

def grad( theta, X, Y):
    temp = (1/m) * X.T @ (sigmoid(X @ theta) - Y)  
    return temp

Now, it is

def grad( theta, X, Y):
    temp = (1/m) * (X.T @ (sigmoid(X @ theta[:,np.newaxis]) - Y))  
    return temp

Now it works!

Dan Kyuso
  • 21
  • 3
0

The problem is that you are calling np.sum together with np.multiply instead of using e.g. np.dot, these operations are in general not equivalent.

The np.multiply operation calculates the elementwise product, while np.dot calculates the proper matrix product, see this answer on SO by Anuj Gautam:

np.dot is the dot product of two matrices.

|A B| . |E F| = |A*E+B*G A*F+B*H|
|C D|   |G H|   |C*E+D*G C*F+D*H|

Whereas np.multiply does an element-wise multiplication of two matrices.

|A B| ⊙ |E F| = |A*E B*F|
|C D|   |G H|   |C*G D*H|

To calculate the cross entropy loss, the matrix multiplication is needed.

Changing your cost function to

def cost_compute(  theta, X, Y):
    J = (-1/m) * (np.dot(Y.T, np.log(sigmoid(X @ theta))) 
        + np.dot((1-Y).T, np.log(1 - sigmoid(X @ theta))))
    return J

results in the desired result for me:

>> cost_compute(temp[0], X, Y)
array([0.2034977])

In addition, the case of your arguments x and y of the cost_compute function is wrong, as you use the capitalized versions X and Y inside the function.

jdamp
  • 1,379
  • 1
  • 15
  • 22
  • Thanks for pointing out the capitalisation error man. But using sum and multiply wasn't wrong in this specific case as Y and sigmoid have shapes m*1, so if I was to use dot product then I am required to do this ``` Y.T @ np.log(sigmoid(X @ theta))``` – Dan Kyuso Jun 23 '19 at 11:35