I'm using both the Scikit-Learn and Seaborn logistic regression functions -- the former for extracting model info (i.e. log-odds, parameters, etc.) and the later for plotting the resulting sigmoidal curve fit to the probability estimations.
Maybe my intuition is incorrect for how to interpret this plot, but I don't seem to be getting results as I'd expect:
#Build and visualize a simple logistic regression
ap_X = ap[['TOEFL Score']].values
ap_y = ap['Chance of Admit'].values
ap_lr = LogisticRegression()
ap_lr.fit(ap_X, ap_y)
def ap_log_regplot(ap_X, ap_y):
plt.figure(figsize=(15,10))
sns.regplot(ap_X, ap_y, logistic=True, color='green')
return None
ap_log_regplot(ap_X, ap_y)
plt.xlabel('TOEFL Score')
plt.ylabel('Probability')
plt.title('Logistic Regression: Probability of High Chance by TOEFL Score')
plt.show
Seems alright, but then I attempt to use the predict_proba
function in Scikit-Learn to find the probabilities of Chance to Admit
given some arbitrary value for TOEFL Score
(in this case 108, 104, and 112):
eight = ap_lr.predict_proba(108)[:, 1]
four = ap_lr.predict_proba(104)[:, 1]
twelve = ap_lr.predict_proba(112)[:, 1]
print(eight, four, twelve)
Where I get:
[0.49939019] [0.44665597] [0.55213799]
To me, this seems to indicate that a TOEFL Score of 112 gives an individual a 55% chance of being admitted based on this data set. If I were to extend a vertical line from 112 on the x-axis to the sigmoid curve, I'd expect the intersection at around .90.
Am I interpreting/modeling this correctly? I realize that I'm using two different packages to calculate the model coefficients but with another model using a different data set, I seem to get correct predictions that fit the logistic curve.
Any ideas or am I completely modeling/interpreting this inaccurately?