2

I have the following code to minimize the Cost Function with its gradient.

def trainLinearReg( X, y, lamda ):
    # theta = zeros( shape(X)[1], 1 )
    theta = random.rand( shape(X)[1], 1 ) # random initialization of theta

    result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta, 
                                     args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
    return result[1], result[0]

But I am having this warning:

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 8403387632289934651424768.000000
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3

My computeCost and computeGradient are defined as

def computeCost( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))

    return J[0]

def computeGradient( theta, X, y, lamda ):
    theta = theta.reshape( shape(X)[1], 1 )
    m     = shape(y)[0]
    J     = 0
    grad  = zeros( shape(theta) )

    h = X.dot(theta)
    squaredErrors = (h - y).T.dot(h - y)
    # theta[0] = 0.0
    J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
    grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta

    return grad.flatten()

I have reviewed these similar questions:

scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”

scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'

scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"

But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?


ANSWER:

I solve this problem based on @lejlot 's comments below. He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.

The previous wrong one:

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

The correct one:

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

where X_poly is actually used in the following traing as

cost, theta = trainLinearReg(X_poly, y, lamda)
Community
  • 1
  • 1
fluency03
  • 2,637
  • 7
  • 32
  • 62
  • Is it happening with `lamda=0`? – lejlot Nov 22 '15 at 13:38
  • @lejlot nope, I have tried both lamda=0.0 and lamda=1.0 . And actually, the assignment requires lamda=0 – fluency03 Nov 22 '15 at 13:41
  • 1
    you should probably attach also your data as it simply seems that you get extremely big value of `J`, so maybe your data is not preprocessed correctly? And you have huge values in X or y? – lejlot Nov 22 '15 at 13:46
  • @lejlot actually, yes. the data value is quite large since the the data are polynomially processed. But this is required, as *Instructions: Given a vector X, return a matrix X_poly where the p-th column of X contains the values of X to the p-th power.* – fluency03 Nov 22 '15 at 13:53
  • @lejlot however, after polynomial process, the data are normalized. – fluency03 Nov 22 '15 at 14:01
  • 1
    It looks like it is not normalized enough, add max, min, mean values of both X and y to your question (or maybe histograms?) – lejlot Nov 22 '15 at 14:02
  • @lejlot you are right. thanks. I have edited my question and gave the answer. – fluency03 Nov 22 '15 at 14:14
  • 1
    You should post the "ANSWER" section of your question as an actual answer and then accept it so that the question doesn't remain open. – ali_m Nov 23 '15 at 01:46
  • also remove the "Answer" section from the question. – Tejas Shetty Mar 06 '20 at 10:11

4 Answers4

1

ANSWER:

I solve this problem based on @lejlot 's comments below. He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.

The previous wrong one:

X_poly            = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

The correct one:

X_poly            = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly            = c_[ones((m, 1)), X_poly]

where X_poly is actually used in the following traing as

cost, theta = trainLinearReg(X_poly, y, lamda)
fluency03
  • 2,637
  • 7
  • 32
  • 62
1

For my implementation scipy.optimize.fmin_cg also failed with the above-mentioned error in some initial guesses. Then I changed it to the BFGS method and converged.

 scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})

seems that this error in cg is inevitable still as, CG ends up with a non-descent direction

Elmira Birang
  • 71
  • 1
  • 6
1

I too faced this problem and even after searching a lot for solutions nothing happened as the solutions were not clearly defined.

Then I read the documentation from scipy.optimize.fmin_cg where it is clearly mentioned that parameter x0 must be a 1-D array.

My approach was same as that of you wherein I passed 2-D matrix as x0 and I always got some precision error or divide by zero error and same warning as you got.

Then I changed my approach and passed theta as a 1-D array and converted that array into 2-D matrix inside the computeCost and computeGradient function which worked for me and I got the results as expected.

My solutiion for Logistic Regression

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

theta = np.zeros(features)

def computeCost(theta,X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
    return -(np.sum(cost))/m

    def computeGradient(theta, X, Y):
    x = np.matrix(X.values)
    y = np.matrix(Y.values)
    theta = np.matrix(theta)
    grad = np.zeros(features)
    xtheta = np.matmul(x,theta.T)
    hx = sigmoid(xtheta)
    error = hx-Y
    for i in range(0,features,1):
        term = np.multiply(error,x[:,i])
        grad[i] = (np.sum(term))/m
    return grad

import scipy.optimize as opt  
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y)) 

print cost(result[0],X, Y)

Note Again that theta has to be a 1-D array

So in your code modify theta in trainLinearReg to theta = random.randn(features)

1

I today faced this problem.

I then noticed that my cost function was implemented wrong way and was producing high scaled errors due to which scipy was asking for more data. Hope this helps for someone like me.

Vicrobot
  • 3,795
  • 1
  • 17
  • 31