Which is the correct implementation of regularization in octave?

Question

I'm currently taking Andrew Ng's machine learning course and I try implementing the stuff as I learn so as not to forget them, I just finished regularization (chapter 7). I know that theta 0 is updated normally, separate from other parameters, however, I am not sure which of these is the correct implementation.

Implementation 1: in my gradient function, after computing the regularization vector, change theta 0 part to 0 so when it is added to the total, it is as if theta 0 was never regularized.

Implementation 2: store theta in a temp variable: _theta, update it with a reg_step of 0 (so it's as if there's no regularization), store the new theta 0 in a temp variable: t1, then update the original theta value with my desired reg_step and replace theta 0 with t1 (value from non-regularized update).

below is my code for the first implementation, it's not meant to be advanced, I'm just practicing: I'm using octave which is 1-index, so theta(1) is theta(0)

function ret = gradient(X,Y,theta,reg_step),
  H = theta' * X;
  dif = H-Y;
  mul = dif .* X;
  total = sum(mul,2);
  m=(size(Y)(1,1));

  regular = (reg_step/m)*theta;
  regular(1)=0;

  ret = (total/m)+regular,
endfunction

Thanks in advance.

It's unclear what your question is. Could you please [edit] the post to include an actual question? In English, questions are sentences ending with a question mark, ?, which can receive an answer. This also helps us volunteers to see what the problem is and what you need help with. See [ask] for reference. — Adriaan, Sep 22 '20 at 12:04

score 1 · Accepted Answer · answered Oct 17 '20 at 08:10

A slight tweak to the first implementation worked for me.

First, calculate regularization for every theta. Then go on to perform gradient step and later you can change the first entry of the matrix containing gradients manually to ignore regularization for theta_0.

% Calculate regularization
regularization = (reg_step / m) * theta;

% Gradient Step
gradients = (1 / m) * (X' * (predictions - y)) + regularization;

% Ignore regularization in theta_0 
gradients(1) = (1 / m) * (X(:, 1)' * (predictions - y));

Which is the correct implementation of regularization in octave?

1 Answers1