I am trying to implement logistic regression for a binary classification problem from scratch in Python. My results do not match those provided by the implementation of sklearn, as you can see in this example. Note that the lines look "similar", but they are clearly not the same.
I took care of what is mentioned in this answer: both sklearn and me (i) fit the intercept term, and; (ii) do not apply regularization (penalty='none'). Also, while sklearn applies 100 iterations to train the algorithm (by default), I am applying 10000 with a rather small learning rate of 0.01. I tried different combination of values, but the problem does not seem to depend on this.
At the same time, I do notice that, even before comparing the results with sklearn, the ones I obtain with my implementation seem to be wrong: the decision regions are clearly off in some cases. You can see an example in this image.
The last point seems to indicate that the problem is all my own fault. Here is my code (it actually generates new datasets at each run and plots the results):
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
def create_training_set():
X0, y = make_blobs(n_samples=[100, 100],
centers=None,
n_features=2,
cluster_std=1)
y = y.reshape(-1, 1) # make y a column vector
return np.hstack([np.ones((X0.shape[0], 1)), X0]), X0, y
def create_test_set(X0):
xx, yy = np.meshgrid(np.arange(X0[:, 0].min() - 1, X0[:, 0].max() + 1, 0.1),
np.arange(X0[:, 1].min() - 1, X0[:, 1].max() + 1, 0.1))
X_test = np.c_[xx.ravel(), yy.ravel()]
X_test = np.hstack([np.ones((X_test.shape[0], 1)), X_test])
return xx, yy, X_test
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def apply_gradient_descent(theta, X, y, max_iter=1000, alpha=0.1):
m = X.shape[0]
cost_iter = []
for _ in range(max_iter):
p_hat = sigmoid(np.dot(X, theta))
cost_J = -1/float(m) * (np.dot(y.T, np.log(p_hat)) + np.dot((1 - y).T, np.log(1 - p_hat)))
grad_J = 1/float(m) * np.dot(X.T, p_hat - y)
theta -= alpha * grad_J
cost_iter.append(float(cost_J))
return theta, cost_iter
fig, ax = plt.subplots(10, 2, figsize = (10, 30))
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])
max_iter = 10000
alpha = 0.1
all_cost_history = []
for n_fil in range(10):
X_train, X0, y = create_training_set()
xx, yy, X_test = create_test_set(X0)
theta, cost_evolution = apply_gradient_descent(np.zeros((X_train.shape[1], 1)), X_train, y, max_iter, alpha)
all_cost_history.append(cost_evolution)
y_pred = np.where(sigmoid(np.dot(X_test, theta)) > 0.5, 1, 0)
y_pred = y_pred.reshape(xx.shape)
ax[n_fil, 0].pcolormesh(xx, yy, y_pred, cmap = cmap_light)
ax[n_fil, 0].scatter(X0[:, 0], X0[:, 1], c=y.ravel(), cmap=cmap_bold, alpha = 1, edgecolor="black")
y = y.reshape(X_train.shape[0], )
clf = LogisticRegression().fit(X0, y)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax[n_fil, 1].pcolormesh(xx, yy, Z, cmap = cmap_light)
ax[n_fil, 1].scatter(X0[:, 0], X0[:, 1], c=y, cmap=cmap_bold, alpha = 1, edgecolor="black")
plt.show()