I have a data set with a binary variable[Yes/No] and a continuous variable (X). I'm trying to make a model to classify [Yes/No] X.
From my data set, when X = 0.5, 48% of the observations are Yes. However, I know the true probability for Yes should be 50% when X = 0.5. When I create a model using logistic regression X = 0.5 != P[Yes=0.5].
How can I correct this? I guess all probabilities should be slightly underestimated if it does not pass true the correct point.
Is it correct just to add a bunch of observations in my sample to adjust the proportion?
Does not have to be just logistic regression, LDA, QDA etc is also of interest.
I have searched Stack Overflow, but only found topics regarding linear regression.