I.e. will the output of GD be an approximation to the LS-determined value, or are these equivalent problems with identical output? Does it perhaps depend on the type of regression: linear, logistic, etc.?
1 Answers
First of all not all regressions are "least square", so the question only makes sense for "least squares regression", which (for linear models) translates to linear regression (and ridge/lasso if we add specific soft constraints).
Once this is fixed, we can address the main question - is gradient based technique converging to the same solution as the ordinary least squares method. I assume that by "least squares" you mean the closed-form solution of least squares. And the answer is "under some assumptions, yes". These assumptions are as follows:
- your learning rate is small enough,
- you perform large enough number of iterations,
- you have infinite precision arithmetics.
While first one is relatively easy to check (there are theorems giving you nice bounds, like being at most 2/L for L-Lipschitz functions), remaining two are quite arbitrary - number of iterations is impossible to determine (however you can show relation between iteration and expected error), and infinite precision is... well... impossible.
The analogous thing is not true for logistic regression, as it does not even have closed form solution to begin with.

- 64,777
- 8
- 131
- 164