I have been trying to create a Randomized Lasso function in R but it doesn't seem to generate the same results as the Python sklearn randomized lasso function. I'm applying the same philosophy here but couldn't understand the difference. The code was modified based on this code: randomized lasso function in R.
Here are the code and sample data:
# generate synthetic data
set.seed(100)
size = 750
x = matrix(runif(14*size),ncol=14)
y = 10 * sin(pi*X[,1]*X[,2]) + 20*(X[,3]-0.5)**2 + 10*X[,4] + 5*X[,5] + runif(1,0,1)
nbootstrap = 200
nsteps = 20
alpha = 0.2
dimx <- dim(x)
n <- dimx[1]
p <- dimx[2]
halfsize <- as.integer(n/2)
freq <- matrix(0,1,p)
for (i in seq(nbootstrap)) {
# Randomly reweight each variable
xs <- t(t(x)*runif(p,alpha,1))
# Ramdomly split the sample in two sets
perm <- sample(dimx[1])
i1 <- perm[1:halfsize]
i2 <- perm[(halfsize+1):n]
# run the randomized lasso on each sample and check which variables are selected
cv_lasso <- lars::cv.lars(xs[i1,],y[i1],plot.it=FALSE, mode = 'step')
idx <- which.max(cv_lasso$cv - cv_lasso$cv.error <= min(cv_lasso$cv))
coef.lasso <- coef(lars::lars(xs[i1,],y[i1]))[idx,]
freq <- freq + abs(sign(coef.lasso))
cv_lasso <- lars::cv.lars(xs[i2,],y[i2],plot.it=FALSE, mode = 'step')
idx <- which.max(cv_lasso$cv - cv_lasso$cv.error <= min(cv_lasso$cv))
coef.lasso <- coef(lars::lars(xs[i1,],y[i1]))[idx,]
freq <- freq + abs(sign(coef.lasso))
print(freq)
}
# normalize frequence in [0,1]
freq <- freq/(2*nbootstrap)
The results should be comparable to similar to the results shown in this table (stability) stability in python. However, this approach and the original R code show in the first hyperlink reference didn't find the correlated features X11 to X14. Not sure which part not working properly in my R code.