After Lasso: Store remaining variables as new dataframe (using R)

Question

First of all, thank you very much for your interest and time. My question (using R): To predict the yvar, I have run a lasso regression which reduced the set of xvariables from 736 to 30.

lasso.mod =glmnet(x,y,alpha=1)
cv.out =cv.glmnet (x,y,alpha=1)
lasso.bestlam =cv.out$lambda.min
tmp_coef = coef(cv.out,s=lasso.bestlam)

varnames = data.frame(name = tmp_coef@Dimnames[[1]][tmp_coef@i])
mylist = list(name = tmp_coef@Dimnames[[1]][tmp_coef@i])

Hence, I have the remaining variable names as a data frame and also as a list. How is it possible to create a new data frame which has these remaining 30 variables and their observations in it? In other words: How can I get a subset of my original data which does not contain 737 variables but only 31?

I think this should be quite easy, however I have been spending more than two hours and it never worked...

Best wishes, Thomas

This seems to be a standard column selection problem. Take your old dataframe and select the columns in your list as a vector. E.g. `mtcars[, c("mpg", "cyl")]` will select these two columns from the `mtcars` dataset. — coffeinjunky, May 04 '17 at 14:48
Searching this site for help with column selection will provide several answers for you. — BLT, May 04 '17 at 14:49
The problem is that the variables after lasso will maybe change (depending on some other things I will do before running the lasso). Therefore, I do not want to write every time 30 variables by hand. But thanks for your time and consideration. — Thomas_Econ, May 04 '17 at 15:40

score 0 · Answer 1 · answered May 04 '17 at 14:49

0

Cannot test your solution as I do not have the data, but this should do the trick:

varnames <- tmp_coef@Dimnames[[1]][tmp_coef@i]
as.data.frame(cbind(x[, varnames], y))

answered May 04 '17 at 14:49

thothal

16,690
3
36
71

score 0 · Answer 2 · answered May 04 '17 at 14:50

Your tmp_coef@Dimnames[[1]][tmp_coef@i] variable contains the names of the remaining variables, but also contains "(Intercept)" as the first item. If you discard it with -1], you can extract the columns:

x <- as.data.frame(x[, tmp_coef@Dimnames[[1]][tmp_coef@i][-1]])

Even simpler, you can use the indices in tmp_coef@i directly:

x <- as.data.frame(x[, tmp_coef@i[-1]])

After Lasso: Store remaining variables as new dataframe (using R)

2 Answers2