Good question! The fitting process in logistic regression is a search procedure that seeks the beta coefficients that minimize the error in the probabilities predicted by the model (continuous values) and the data (discrete values).
In logistic regression, you model probabilities using a logistic function (also known as a sigmoid function):
XB = B0 + B1 * X1 + B2 * X2 + ... + BN * XN
p(X) = e^(XB) / (1 + e^(XB))
The algorithm tries to find the beta coefficients that minimize the error using Maximum Likelihood estimation. The function to be minimized is called the cost function, and it can be any number of things. The most common ones are:
sum (P(X_i) - y_i)^2
sum |P(X_i) - y_i|
A random set of betas is picked at random, the cost is calculated and the algorithm will pick a new set of betas that will result in a lower cost. The algorithm stops searching for new betas when the decrease in cost is smaller than a given threshold (set by the tol
parameter in sklearn).
The way the model converges to the final set of coefficients depends on the solver
parameter. Each solver has a different way of converging to the final set of betas, but they usually converge to the same results.