I am implementing logistic regression in Python with numpy. I have generated the following data set:
# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 1000
# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([1.,1])
# number of data points
m1 = 1000
# generate m gaussian distributed data points with
# mean and cov.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)
X = np.concatenate((r0,r1))
Now I have implemented the sigmoid function with the aid of the following methods:
def logistic_function(x):
""" Applies the logistic function to x, element-wise. """
return 1.0 / (1 + np.exp(-x))
def logistic_hypothesis(theta):
return lambda x : logistic_function(np.dot(generateNewX(x), theta.T))
def generateNewX(x):
x = np.insert(x, 0, 1, axis=1)
return x
After applying logistic regression, I found out that the best thetas are:
best_thetas = [-0.9673200946417307, -1.955812236119612, -5.060885703369424]
However, when I apply the logistic function with these thetas, then the output is numbers that are not inside the interval [0,1]
Example:
data = logistic_hypothesis(np.asarray(best_thetas))(X)
print(data
This gives the following result:
[2.67871968e-11 3.19858822e-09 3.77845881e-09 ... 5.61325410e-03
2.19767618e-01 6.23288747e-01]
Can someone help me understand what has gone wrong with my implementation? I cannot understand why I am getting such big values. Isnt the sigmoid function supposed to only give results in the [0,1] interval?