randomized lasso in R

Question

I have been trying to create a Randomized Lasso function in R but it doesn't seem to generate the same results as the Python sklearn randomized lasso function. I'm applying the same philosophy here but couldn't understand the difference. The code was modified based on this code: randomized lasso function in R.

Here are the code and sample data:

# generate synthetic data
set.seed(100)
size = 750
x = matrix(runif(14*size),ncol=14)
y = 10 * sin(pi*X[,1]*X[,2]) + 20*(X[,3]-0.5)**2 + 10*X[,4] + 5*X[,5] + runif(1,0,1)
nbootstrap = 200
nsteps = 20
alpha = 0.2

dimx <- dim(x)
n <- dimx[1]
p <- dimx[2]
halfsize <- as.integer(n/2)
freq <- matrix(0,1,p)

for (i in seq(nbootstrap)) {

  # Randomly reweight each variable
  xs <- t(t(x)*runif(p,alpha,1))

  # Ramdomly split the sample in two sets
  perm <- sample(dimx[1])
  i1 <- perm[1:halfsize]
  i2 <- perm[(halfsize+1):n]

  # run the randomized lasso on each sample and check which variables are selected
  cv_lasso <- lars::cv.lars(xs[i1,],y[i1],plot.it=FALSE, mode = 'step')
  idx <- which.max(cv_lasso$cv - cv_lasso$cv.error <= min(cv_lasso$cv))
  coef.lasso <- coef(lars::lars(xs[i1,],y[i1]))[idx,]
  freq <- freq + abs(sign(coef.lasso))

  cv_lasso <- lars::cv.lars(xs[i2,],y[i2],plot.it=FALSE, mode = 'step')
  idx <- which.max(cv_lasso$cv - cv_lasso$cv.error <= min(cv_lasso$cv))
  coef.lasso <- coef(lars::lars(xs[i1,],y[i1]))[idx,]
  freq <- freq + abs(sign(coef.lasso))
  print(freq)
}

# normalize frequence in [0,1]
freq <- freq/(2*nbootstrap)

The results should be comparable to similar to the results shown in this table (stability) stability in python. However, this approach and the original R code show in the first hyperlink reference didn't find the correlated features X11 to X14. Not sure which part not working properly in my R code.

Sure. Will read the guideline for future questions. – Kuang Liang Apr 25 '17 at 13:50 — Kuang Liang, Apr 25 '17 at 13:50

score 3 · Answer 1 · answered Apr 25 '17 at 03:35

First thank you for posting this question. I enjoyed learning about stability selection while looking over your code and references. Second, you will probably kick yourself when you see this answer. I think your code is valid but you forgot to create the strongly correlated features of the "Friedamn #1 regression dataset." The python code from your second link is as follows:

#"Friedamn #1” regression problem
Y = (10 * np.sin(np.pi*X[:,0]*X[:,1]) + 20*(X[:,2] - .5)**2 +
     10*X[:,3] + 5*X[:,4] + np.random.normal(0,1))
#Add 3 additional correlated variables (correlated with X1-X3)
X[:,10:] = X[:,:4] + np.random.normal(0, .025, (size,4))

Your code does not include the second step. Therefore all but the first couple of features are noise and are correctly excluded from your stability selection algorithm.

That's great. You saved my day. I'm kicking myself now. – Kuang Liang Apr 25 '17 at 13:50 — Kuang Liang, Apr 25 '17 at 13:50

randomized lasso in R

1 Answers1