5

I am running the following (truncated) code using glmnet in R

# do a lot of things to create the design matrix called x.design

> glmnet(x.design, y, thresh=1e-11)

where x.design is a n x p design matrix where n > p and y is a n x 1 vector of responses obtained using kernel density estimation. Both x.design and y contain real entries. I get the following error message when I run my code:

Error in if (nulldev == 0) stop("y is constant; gaussian glmnet fails at 
standardization step") : missing value where TRUE/FALSE needed 

I have visited and read

Running glmnet package in R, getting error "missing value where TRUE/FALSE needed", maybe due to missing values?

however I could not figure out a way to fix to my issue.

Could someone suggest a solution please?

NM_
  • 1,887
  • 3
  • 12
  • 27

3 Answers3

4

It seems that your response vector y is constant. GLMNET tries to standardize it (maybe substract the mean, then divide by current stddev), and cannot because the stddev is 0. Print y and its variance to be sure.

You should also check your kernel estimation procedure.

P. Camilleri
  • 12,664
  • 7
  • 41
  • 76
2

Try removing nulls from your data by --> na.omit(data)

pissall
  • 7,109
  • 2
  • 25
  • 45
1

A more general answer to this question is that glmnet does not handle any type of missing values like other "regressions" functions in R (be it NAs, NaNs or otherwise) as described here for instance. It only works with complete cases in that sense.

So, the solution I propose to the error message above is to remove all rows from the input matrix x.design that correspond to non numeric values in the response vector y. Something like this would do, for instance:

x.design <- x.design[grep("\\d", y)]

This code simply uses regular expressions to select rows of the response vector that contain digits (literal numbers) and subsets the input matrix according to those rows (rows that the glmnet function can actually use).

Then you also subset your response vector the same way and you are good to go (naturally, it is important to subset the response vector after the input matrix):

y <- y[grep("\\d", y)]
J.Ofoaks
  • 36
  • 3