8

I get this error when trying to fit glmnet() with family="binomial", for Logistic Regression fit:

> data <- read.csv("DAFMM_HE16_matrix.csv", header=F)
> x <- as.data.frame(data[,1:3])
> x <- model.matrix(~.,data=x)
> y <- data[,4]

> train=sample(1:dim(x)[1],287,replace=FALSE)

> xTrain=x[train,]
> xTest=x[-train,]
> yTrain=y[train]
> yTest=y[-train]

> fit = glmnet(xTrain,yTrain,family="binomial")

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
one multinomial or binomial class has 1 or 0 observations; not allowed

Any help would be greatly appreciated - I've searched the internet and haven't been able to find anything that helps

EDIT:

Here's what data looks like:

> data
          V1       V2    V3      V4
1   34927.00   156.60 20321  -12.60
2   34800.00   156.60 19811  -18.68
3   29255.00   156.60 19068    7.50
4   25787.00   156.60 19608    6.16
5   27809.00   156.60 24863   -0.87
...
356 26495.00 12973.43 11802    6.35
357 26595.00 12973.43 11802   14.28
358 26574.00 12973.43 11802    3.98
359 25343.00 14116.18 11802   -2.05
  • Are you sure your `yTrain` contains at least 2 distinct values? – Hong Ooi May 01 '15 at 21:10
  • @HongOoi Absolutely. There are 287 distinct values and I checked to make sure it wasn't a matrix and is a vector. –  May 01 '15 at 21:18
  • @HongOoi I also tried just running glmnet(x,y,family="binomial") which yielded the same error. –  May 01 '15 at 21:49
  • 1
    Well, hang on; your `V4` variable appears to be continuous, not binary. You can't fit a logistic model with that. – Hong Ooi May 01 '15 at 22:16
  • @HongOoi ahhhhhh! gotcha this worked (and makes sense). I could run glmnet() when family <> "binomial" but it broke when I included it. I added this code and it worked: trigger = 5 y <- ifelse(data$V4 > trigger,1,0) –  May 01 '15 at 22:27
  • @HongOoi put this as comment and I'll accept as answer –  May 01 '15 at 22:32
  • 4
    This error also can occur legitimately (when the target variable is a factor), e.g. in cv.glmnet, for some choices of random seed, esp. with severe class imbalances, when one of the CV folds does in fact end up with only have 0 or 1 observation. Since that occurs randomly, you have to gracefully handle it. – smci Jul 03 '15 at 02:01
  • @groutgauss I run into the same problem, where did you add the code "trigger = 5 y <- ifelse(data$V4 > trigger,1,0)" ? – Bob May 08 '18 at 03:02
  • @Bob if you are running a 'binomial' then you have to make sure your data is binary (either 1 or 0) and not continuous variable. So add the cutoff and switch to binary after importing the data but beforerunning the model –  May 09 '18 at 16:00

2 Answers2

3

I think it is because of the levels of your factor variable. Suppose there are 10 levels and your 1 level has only one record, try to remove this level. You can use drop levels from gdata package.

Ashique PS
  • 691
  • 1
  • 12
  • 26
prahlad
  • 41
  • 2
  • Or, if the data on which you are training is a fraction of the total data set (as it should be) - use more data as your training set, until the error disappears. You can try to estimate whether you have under-populated classes by doing `table(myData$responseColumn)` – radumanolescu Nov 28 '19 at 20:16
1

This is generally because of data structure and their response variable, sometimes the response has more than binary output. or the data response variable has binary out come, but they have much more one class from the other and we may called them most probably class imbalance problem. Therefore the problem then occur during training and testing the data. So, you must convert the response variable into binary if there are more than two outcomes, 2nd you may apply multinomial as respect to binomial. Hope this can help you.