2

I understand that both LinearRegression class and SGDRegressor class from scikit-learn performs linear regression. However, only SGDRegressor uses Gradient Descent as the optimization algorithm.

Then what is the optimization algorithm used by LinearRegression, and what are the other significant differences between these two classes?

2 Answers2

5

LinearRegression always uses the least-squares as a loss function.

For SGDRegressor you can specify a loss function and it uses Stochastic Gradient Descent (SGD) to fit. For SGD you run the training set one data point at a time and update the parameters according to the error gradient.

In simple words - you can train SGDRegressor on the training dataset, that does not fit into RAM. Also, you can update the SGDRegressor model with a new batch of data without retraining on the whole dataset.

Danylo Baibak
  • 2,106
  • 1
  • 11
  • 18
  • Why there is a change in `SGDRegressor` output i.e., coefficients and intercept when using `squared_error` as loss function? Is it possible to get the same coefficients from SGDRegressor as well? Which loss function will help in doing so? – shaik moeed Jan 09 '22 at 11:30
0

To understand the algorithm used by LinearRegression, we must have in mind that there is (in favorable cases) an analytical solution (with a formula) to find the coefficients which minimize the least squares:

theta = (X'X)^(-1)X'Y (1)

where X' is the the transpose matrix of X.

In the case of non-invertibility, the inverse can be replaced by the Moore-Penrose pseudo-inverse calculated using "singular value decomposition" (SVD). And even in the case of invertibility, the SVD method is faster and more stable than applying the formula (1).

PS - No LaTeX (MathJaX) in Stackoverflow ???

-- Pierre (from France)

pbejian
  • 151
  • 4