I am facing the following error using modelr
add_predictions
function.
modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): fe.lead.surgeon has new levels ....
In my understanding, it is a common issue that arises when you are making the prediction model using a train dataset and applying the model to a test dataset since the factor levels that existed in a train dataset may not be present in a test dataset. However, I am using the same sample for creating the model and getting the predicted values, and still getting this error.
Specifically, here is the code I am using, and I would appreciate it for any insight on why this error occurs and how to solve this issue.
# indep is a vector of independent variable names
# dep is a vector of dependent variable names
# id.case is the id variable
# sample is my dataset.
eq <-
paste(indep, collapse = ' + ') %>%
paste(dep, ., sep = ' ~ ') %>%
as.formula
s <-
lm(eq, data = sample %>% select(-id.case))
pred <-
sample %>%
modelr::add_predictions(s) %>%
select(id.case, pred)
As per the request of @SimoneBianchi, I am providing the reproducible example here.
Reproducible example
library(tidyverse)
library(tibble)
library(data.table)
rename <- dplyr::rename
select <- dplyr::select
set.seed(10002)
id <- sample(1:1000, 1000, replace=F)
set.seed(10003)
fe1 <- sample(c('A','B','C'), 1000, replace=T)
set.seed(10001)
fe2 <- sample(c('a','b','c'), 1000, replace=T)
set.seed(10001)
cont1 <- sample(1:300, 1000, replace=T)
set.seed(10004)
value <- sample(1:30, 1000, replace=T)
sample <-
data.frame(id, fe1, fe2, cont1, value)
dep <- 'value'
indep <-
c('fe1','fe2', 'cont1')
eq <-
paste(indep, collapse = ' + ') %>%
paste(dep, ., sep = ' ~ ') %>%
as.formula
s <-
lm(eq, data = sample %>% select(-id))
pred <-
sample %>%
modelr::add_predictions(s) %>%
select(id, pred)
Update and Workaround
One workaround I found is that you don't use modelr function but use fitted function. However, I would still want to learn why the regression automatically drops soma factor levels from a factor variable. If anyone knows, please leave a comment.
pred <-
sample %>%
cbind(pred = fitted(s))
Closing: Problem found with the dataset
I found that some observations were NA that had new levels
in the corresponding factor variable -- the error. After I fixed the NA, the original code worked fine. So, it was a problem with the dataset rather than the code!
Thank you all for trying to help me out.