Logistic Regression in Scipy optimization problem TNC vs CG

Question

I'm doing the Coursera Machine Learning MOOC by Andrew Ng Exercise 2: Logistic Regression in Python here: https://github.com/dibgerge/ml-coursera-python-assignments/blob/master/Exercise2/exercise2.ipynb

I have worked through all the logistic regression and regularization part, but stuck at the final part on changing the degree of regularization by changing lamda.

The following are the original code:

# Initialize fitting parameters
initial_theta = np.zeros(X.shape[1])

# Set regularization parameter lambda to 1 (you should vary this)
lambda_ = 1

# set options for optimize.minimize
options= {'maxiter': 100}

res = optimize.minimize(costFunctionReg,
                        initial_theta,
                        (X, y, lambda_),
                        jac=True,
                        method='TNC',
                        options=options)

# the fun property of OptimizeResult object returns
# the value of costFunction at optimized theta
cost = res.fun

# the optimized theta is in the x property of the result
theta = res.x

utils.plotDecisionBoundary(plotData, theta, X, y)
pyplot.xlabel('Microchip Test 1')
pyplot.ylabel('Microchip Test 2')
pyplot.legend(['y = 1', 'y = 0'])
pyplot.grid(False)
pyplot.title('lambda = %0.2f' % lambda_)

# Compute accuracy on our training set
p = predict(theta, X)

print('Train Accuracy: %.1f %%' % (np.mean(p == y) * 100))
print('Expected accuracy (with lambda = 1): 83.1 % (approx)\n')

This will always output a training accuracy of 66.1%, no matter what lambda I feed in. However, if I change the TNC to CG, the training accuracy will be 83.1% as expected and will increase if I decrease lambda.

Any idea on why this happen? Appreciate your help in advance!

Some of the old interfaces/routines have the "clever" habit to cache the last problem, probably so that the next call or repeated calls go faster without an initialization phase. However it caches too much. This does not matter if you only have one problem instance per script, but becomes noticeable in coding environments like IPython/Jupyter. Try if restarting the kernel for the changed parameter helps. In general, be sure to use the newest version and the newest methods in that version of `scipy`. See https://stackoverflow.com/a/59393068/3088138 for a likely similar problem. — Lutz Lehmann, May 25 '20 at 06:54
Thanks. I tried to do it outside of the notebook and still have the error. It seems there are some behavior difference for solvers: TNC vs CG... But I'm no expert on this... — xzx, May 25 '20 at 23:09

Logistic Regression in Scipy optimization problem TNC vs CG

0 Answers0