0

I am having difficulty reading open source mllib code for SGD with L2 regularization.

The code is

class SquaredL2Updater extends Updater {
override def compute(
  weightsOld: Vector,
  gradient: Vector,
  stepSize: Double,
  iter: Int,
  regParam: Double): (Vector, Double) = {
// add up both updates from the gradient of the loss (= step) as well as
// the gradient of the regularizer (= regParam * weightsOld)
// w' = w - thisIterStepSize * (gradient + regParam * w)
// w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
val thisIterStepSize = stepSize / math.sqrt(iter)
val brzWeights: BV[Double] = weightsOld.toBreeze.toDenseVector
brzWeights :*= (1.0 - thisIterStepSize * regParam)
brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)
val norm = brzNorm(brzWeights, 2.0)

(Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
}

The part I am having trouble with is

brzWeights :*= (1.0 - thisIterStepSize * regParam)

the breeze lib has documentation that explains the :*= operator

/** Mutates this by element-wise multiplication of b into this. */
final def :*=[TT >: This, B](b: B)(implicit op: OpMulScalar.InPlaceImpl2[TT, B]): This = {
 op(repr, b)
 repr
}

it looks like its just multiplication of a vector by a scalar.

The formula I found for gradient in case of L2 regularization is

L2 gradient

How does the code represent this gradient in this update? Can someone help please.

zero323
  • 322,348
  • 103
  • 959
  • 935
bhomass
  • 3,414
  • 8
  • 45
  • 75

1 Answers1

0

Ok, I figured it out. The updater equation is

enter image description here

rearranging terms gives

enter image description here

recognizing the last term is just the gradient

enter image description here

This is equivalent to the code which has

brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)

breaking that out

brzWeights = brzWeights + -thisIterStepSize * gradient.toBreeze

in the previous line, brzWeights :*= (1.0 - thisIterStepSize * regParam)

which means brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam)

so, finally

brzWeights = brzWeights * (1.0 - thisIterStepSize * regParam) + (-thisIterStepSize) * gradient.toBreeze

Now the code and equation match within a normalization factor, which I believe is taken care of in the following line.

bhomass
  • 3,414
  • 8
  • 45
  • 75