Lambda Issue, or cross validation

Question

I am doing double cross validation with LASSO of glmnet package, however when I plot the results I am getting lambda of 0 - 150000 which is unrealistic in my case, not sure what is wrong I am doing, can someone point me in the right direction. Thanks in advance!

calcium = read.csv("calciumgood.csv", header=TRUE)
dim(calcium)
n = dim(calcium)[1]
calcium = na.omit(calcium)
names(calcium)

library(glmnet)  # use LASSO model from package glmnet 
lambdalist = exp((-1200:1200)/100)  # defines models to consider


fulldata.in = calcium
x.in = model.matrix(CAMMOL~. - CAMLEVEL - AGE,data=fulldata.in)
y.in = fulldata.in[,2]
k.in = 10 
n.in = dim(fulldata.in)[1]
groups.in = c(rep(1:k.in,floor(n.in/k.in)),1:(n.in%%k.in))  
set.seed(8)
cvgroups.in = sample(groups.in,n.in)  #orders randomly, with seed (8) 
#LASSO cross-validation
cvLASSOglm.in = cv.glmnet(x.in, y.in, lambda=lambdalist, alpha = 1, nfolds=k.in, foldid=cvgroups.in)
plot(cvLASSOglm.in$lambda,cvLASSOglm.in$cvm,type="l",lwd=2,col="red",xlab="lambda",ylab="CV(10)")
whichlowestcvLASSO.in = order(cvLASSOglm.in$cvm)[1];     min(cvLASSOglm.in$cvm)
bestlambdaLASSO = (cvLASSOglm.in$lambda)[whichlowestcvLASSO.in];     bestlambdaLASSO
abline(v=bestlambdaLASSO)
bestlambdaLASSO  # this is the lambda for the best LASSO model
LASSOfit.in = glmnet(x.in, y.in, alpha = 1,lambda=lambdalist)  # fit the model across possible lambda
LASSObestcoef = coef(LASSOfit.in, s = bestlambdaLASSO); LASSObestcoef # coefficients for the best model fit

Could you advise where to get calciumgood.csv? – Artem Jul 24 '18 at 12:43 — Artem, Jul 24 '18 at 12:43

score 0 · Answer 1 · answered Jul 29 '18 at 05:41

I found the dataset you referring at Calcium, inorganic phosphorus and alkaline phosphatase levels in elderly patients.

Basically the data are "dirty", and it is a possible reason why the algorithm does not converge properly. E.g. there are 771 year old patients, bisides 1 and 2 for male and female, there is 22 for sex encodeing etc.

As for your case you removed only NAs.

You need to check data.frame imported types as well. E.g. instead of factors it could be imported as integers (SEX, Lab and Age group) which will affect the model.

I think you need: 1) cleanse the data; 2) if doesnot work submit *.csv file

Lambda Issue, or cross validation

1 Answers1