-1

I am trying to figure out how to pass my regression model through to test it with the other part of my dataset so I can start my confusing matrix but I am at a loss of what I am doing wrong.

studentreport<-read.csv("C:\\Users\\Joseph\\Downloads\\studentreport dataset full imp.csv",header=T,sep=",")
studentreport<-data.frame(studentreport)

smp_size <- floor(0.75 * nrow(studentreport))

set.seed(123)
train_ind <- sample(seq_len(nrow(studentreport)), size = smp_size)

train <- studentreport[train_ind, ]
test <- studentreport[-train_ind, ]

fitreport<-glm(train)
Fitstart=glm(Enrolling~1,data=train)

Report<-step(Fitstart,direction="forward",scope=formula(fitreport))

predict(Report, newdata = test,type ="response")

When I do that predict I get this error:

"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor State has new levels AP"

dupt: Report studentreport

Saurabh Chauhan
  • 3,161
  • 2
  • 19
  • 46
  • 1
    Hi, most likely the problem is that you have a factor in your dataframe and not all levels of this factor are present in the test data set. See also this question here https://stackoverflow.com/questions/16493920/how-can-i-ensure-that-a-partition-has-representative-observations-from-each-leve – Cettt Aug 03 '18 at 06:23
  • 2
    Please google "Error in model.frame.default new levels". This has been asked many times and the error states the problem. The variable contains levels that were not part of the design matrix the model has been trained on because they were either not in the training dataset or were removed because observations were removed due to missing values in other variables. – Roland Aug 03 '18 at 06:23
  • So the code predict(Report, newdata=test, type response) is the correct way to write it? – Timenight113 Aug 03 '18 at 07:17

1 Answers1

0

I reporduced the code you posted. As I did not find Enrolling column in your data, I used glm on GPATypeWeighted column for the sake of model check. No error with prediction was detected.

library(leaps)
library(caret)
studentreport <- dget("https://drive.google.com/uc?authuser=0&id=1PHpkhPpEjIt-apCJpzvAKAlWZTPX7Evv&export=download")
studentreport <- data.frame(studentreport)
smp_size <- floor(0.75 * nrow(studentreport))

set.seed(123)
train_ind <- sample(seq_len(nrow(studentreport)), size = smp_size)

train <- studentreport[train_ind, ]
test <- studentreport[-train_ind, ]

fitreport <- glm(train)
Fitstart = glm(GPATypeWeighted ~ 1, data = train)

Report <- step(Fitstart, direction="forward", scope = formula(fitreport))

predict(Report, newdata = test, type ="response")

Output:

           3            4            5            7           13           14           16           23           27           36 
1.000000e+00 1.000000e+00 1.000000e+00 1.804986e-15 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
          37           43           44           56           57           60           62           64           66           69 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 2.097525e-15 
          70           79           82           86           91           92           93           96           97          100 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
         101          108          112          114          115          116          117          120          123          138 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 2.199615e-15 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
         140          148          155          157          158          161          164          165          174          177 
1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 
         180          185          187          200          203          204          207          214          215          216 
1.000000e+00 1.000000e+00 1.000000e+00 1.756027e-15 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00 1.686952e-15 1.000000e+00 
         222          239          248 
1.000000e+00 1.000000e+00 1.000000e+00 
Artem
  • 3,304
  • 3
  • 18
  • 41