0

I am facing the following error using modelr add_predictions function.

modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): fe.lead.surgeon has new levels ....

In my understanding, it is a common issue that arises when you are making the prediction model using a train dataset and applying the model to a test dataset since the factor levels that existed in a train dataset may not be present in a test dataset. However, I am using the same sample for creating the model and getting the predicted values, and still getting this error.

Specifically, here is the code I am using, and I would appreciate it for any insight on why this error occurs and how to solve this issue.

# indep is a vector of independent variable names
# dep is a vector of dependent variable names
# id.case is the id variable
# sample is my dataset.

  eq <- 
            paste(indep, collapse = ' + ') %>%
            paste(dep, ., sep = ' ~ ') %>%
            as.formula  
          
          s <-
            lm(eq, data = sample %>% select(-id.case))
          
          pred <- 
            sample %>% 
            modelr::add_predictions(s) %>% 
            select(id.case, pred) 

As per the request of @SimoneBianchi, I am providing the reproducible example here.

Reproducible example

  library(tidyverse)
  library(tibble)
  library(data.table)
  
  rename <- dplyr::rename
  select <- dplyr::select
  
  set.seed(10002)
  id <- sample(1:1000, 1000, replace=F)
  
  set.seed(10003)
  fe1 <- sample(c('A','B','C'), 1000, replace=T)
  
  set.seed(10001)
  fe2 <- sample(c('a','b','c'), 1000, replace=T)
  
  set.seed(10001)
  cont1 <- sample(1:300, 1000, replace=T)
  
  set.seed(10004)
  value <- sample(1:30, 1000, replace=T)
  
  sample <-   
    data.frame(id, fe1, fe2, cont1, value) 

  dep <- 'value'
  
  indep <- 
    c('fe1','fe2', 'cont1')
  
  
  eq <- 
    paste(indep, collapse = ' + ') %>%
    paste(dep, ., sep = ' ~ ') %>%
    as.formula  
  
  s <-
    lm(eq, data = sample %>% select(-id))
  
  pred <- 
    sample %>% 
    modelr::add_predictions(s) %>% 
    select(id, pred)

Update and Workaround

One workaround I found is that you don't use modelr function but use fitted function. However, I would still want to learn why the regression automatically drops soma factor levels from a factor variable. If anyone knows, please leave a comment.

   pred <- 
    sample %>% 
    cbind(pred = fitted(s))

Closing: Problem found with the dataset

I found that some observations were NA that had new levels in the corresponding factor variable -- the error. After I fixed the NA, the original code worked fine. So, it was a problem with the dataset rather than the code!

Thank you all for trying to help me out.

J.K.
  • 325
  • 2
  • 8
  • Please provide some data to reproduce the error, thanks! – Simone Bianchi Feb 09 '22 at 14:11
  • @SimoneBianchi Unfortunately, I cannot provide the exact sample data for this as the data is confidential. Also, if I try it with another dataset, I believe that it will work as the sample remains the same for making the model and making the predictions. I am wondering if anyone encountered this issue and how he/she solved the issue. – J.K. Feb 09 '22 at 14:15
  • 1
    Can you provide just few lines of data, change the variable names, and check if the errors is still there? Otherwise I do not know if we can help you: is the problem in your data or in your code or somewhere else? – Simone Bianchi Feb 09 '22 at 14:21
  • @SimoneBianchi I will do so and let you know! – J.K. Feb 09 '22 at 14:22
  • @SimoneBianchi I edited the post and now it has a reproducible example. It works fine, and I wonder what could be an issue. – J.K. Feb 09 '22 at 14:28
  • It seems that the regression automatically ignores some factor levels in a factor variable. – J.K. Feb 09 '22 at 14:40
  • @SimoneBianchi I found the error Simone! Thank you for trying to help me out. – J.K. Feb 09 '22 at 14:58
  • 1
    You're welcome. I guess there was something in the data. Good luck with the rest of the analysis – Simone Bianchi Feb 09 '22 at 15:33

0 Answers0