I'm calcualting the weights for a linear regression with weight-decay, i.e. normally I am trying to find beta = (X'X + lambda I)^-1 X'Y
where X
has n
rows of D
features each and Y
is a vector of outputs for each row of X
.
I've been fitting without a bias term by using:
def wd_fit(A, y, lamb=0):
n_col = A.shape[1]
return np.linalg.lstsq(A.T.dot(A) + lamb * np.identity(n_col), A.T.dot(y))
I'd like to also calculate a bias or intercept term for the fit, instead of having it pass through the origin. I'd like to keep the same call to lstsq, so if there's some matrix transform I can carry out, that would be ideal. My inclination is to append column of 1s somewhere, so that X_mod
say would then have D+1
features where the last relates to the intercept value, but I'm not quite sure where that should be or even if it's correct.