0

Fitting a linear-regression model using least squares on my training dataset works fine.

library(Matrix)
library(tm)
library(glmnet)
library(e1071)
library(SparseM)
library(ggplot2)

trainingData <- read.csv("train.csv", stringsAsFactors=FALSE,sep=",", header = FALSE)
testingData  <- read.csv("test.csv",sep=",", stringsAsFactors=FALSE, header = FALSE)

lm.fit = lm(as.factor(V42)~ ., data = trainingData)
linearMPrediction = predict(lm.fit,newdata = testingData, se.fit = TRUE)
mean((linearMPrediction$fit - testingData[,20:41])^2) 
linearMPrediction$residual.scale

However, when i try to fit a ridge-regression model on my training dataset as,

x = model.matrix(as.factor(V42)~., data = trainingData) 
y = as.factor(trainingData$V42) 
ridge = glmnet(x, y, family = "multinomial", alpha = 1, lambda.min.ratio = 1e-2)

I am having the following error for both multinomial and binomial distributions.

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  : 
  one multinomial or binomial class has 1 or 0 observations; not allowed

Am I missing something? Any comment would be greatly appreciated. Here is a portion of how my data looks like by the way.

> trainingData$V42[1:50]
 [1] "normal"      "normal"      "neptune"     "normal"      "normal"      "neptune"     "neptune"     "neptune"     "neptune"     "neptune"     "neptune"    
[12] "neptune"     "normal"      "warezclient" "neptune"     "neptune"     "normal"      "ipsweep"     "normal"      "normal"      "neptune"     "neptune"    
[23] "normal"      "normal"      "neptune"     "normal"      "neptune"     "normal"      "normal"      "normal"      "ipsweep"     "neptune"     "normal"     
[34] "portsweep"   "normal"      "normal"      "normal"      "neptune"     "normal"      "neptune"     "neptune"     "neptune"     "normal"      "normal"     
[45] "normal"      "neptune"     "teardrop"    "normal"      "warezclient" "neptune"  

> x
      (Intercept)    V1 V2tcp V2udp V3bgp V3courier V3csnet_ns V3ctf V3daytime V3discard V3domain V3domain_u V3echo V3eco_i V3ecr_i V3efs V3exec V3finger V3ftp
1               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
2               1     0     0     1     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
3               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
4               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
5               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
6               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
7               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
8               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
9               1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0
10              1     0     1     0     0         0          0     0         0         0        0          0      0       0       0     0      0        0     0

> y[1:50]
 [1] normal      normal      neptune     normal      normal      neptune     neptune     neptune     neptune     neptune     neptune     neptune     normal     
[14] warezclient neptune     neptune     normal      ipsweep     normal      normal      neptune     neptune     normal      normal      neptune     normal     
[27] neptune     normal      normal      normal      ipsweep     neptune     normal      portsweep   normal      normal      normal      neptune     normal     
[40] neptune     neptune     neptune     normal      normal      normal      neptune     teardrop    normal      warezclient neptune    
22 Levels: back buffer_overflow ftp_write guess_passwd imap ipsweep land loadmodule multihop neptune nmap normal phf pod portsweep rootkit satan smurf spy ... warezmaster

> table(y)
y
           back buffer_overflow       ftp_write    guess_passwd            imap         ipsweep            land      loadmodule        multihop         neptune 
            196               6               1              10               5             710               1               1               2            8282 
           nmap          normal             phf             pod       portsweep         rootkit           satan           smurf             spy        teardrop 
            301           13449               2              38             587               4             691             529               1             188 
    warezclient     warezmaster 
            181               7 
Desta Haileselassie Hagos
  • 23,140
  • 7
  • 48
  • 53

1 Answers1

3

You have single observations for some of the classes (like ftp_write with only 1 observation), which is not allowed (and clearly stated in the error).

David C.
  • 1,974
  • 2
  • 19
  • 29
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • Hello. What R package do you advise to use for Ridge Regression? glmnet, bigRR, Mass, other? Any of them able to deal with repeated measures (random effects)? – skan Jun 04 '16 at 18:49
  • Ridge regression is so basic model that it does not matter what you use. Just take a module which has easiest API for you. Ridge regression is not assuming anything about "repeated measures" thus you will be just fine (assuming that your data in general is generated correctly) – lejlot Jun 04 '16 at 19:23
  • I mean I want the program to work with the coefficients of the fixed effects but I don't want him to remove the random effects coefficient. Also the results won't be the same if the program takes into account that several measures were taken on each individual – skan Jun 04 '16 at 19:32
  • This response also seems to apply when `cv.glmnet` is used on only 2 observations with class `1` and 2 observations with class `0`. Thinking about it: If you cross validate, on only 4 samples, you'll always run into having 1 sample in a single class. – David C. Dec 27 '17 at 21:07