0

Need an mllib expert to help explain the linear regression code. In LeastSquaresGradient.compute

override def compute(
    data: Vector,
    label: Double,
    weights: Vector,
    cumGradient: Vector): Double = {
  val diff = dot(data, weights) - label
  axpy(diff, data, cumGradient)
  diff * diff / 2.0
}

cumGradient is computed using axpy, which is simply y += a * x, or here cumGradient += diff * data

I thought for a long time but can make the connection to the gradient calculation as defined in the gradient descent documentation. In theory the gradient is the slope of the loss against delta in one particular weighting parameter. I don't see anything in this axpy implementation that remotely resemble that.

Can someone shed some light?

zero323
  • 322,348
  • 103
  • 959
  • 935
bhomass
  • 3,414
  • 8
  • 45
  • 75
  • 1
    I believe this question belongs on [stats.stackexchange.com](http://stats.stackexchange.com/). – zero323 Sep 02 '15 at 02:23

1 Answers1

3

It is not really a programming question but to give you some idea what is going on cost function for least square regression is defined as

enter image description here

where theta is weights vector.

Partial derivatives of the above cost function are:

enter image description here

and if computed over all theta:

enter image description here

It should be obvious that above is equivalent to cumGradient += diff * data computed for all data points and to quote Wikipedia

in a rectangular coordinate system, the gradient is the vector field whose components are the partial derivatives of f

zero323
  • 322,348
  • 103
  • 959
  • 935
  • Excellent. My confusion was not seeing that x, and y are both Vectors with the features as elements. Now the code does perfectly match the math. – bhomass Sep 04 '15 at 01:23