5

I am using the randomForest package to classify a binary outcome variable with the standard process. I first had to force a change on all variables to make sure they were numeric and then used na.roughfix to handle missing values:

data <- read.csv("data.csv")
data <- lapply(data, as.numeric)
data <- na.roughfix(data) 

Then i run the model:

model <- randomForest(as.factor(outcome) ~ V1 + V2...+ VN, 
         data=data, 
         importance=TRUE,
         ntree=500)

and I get the following error:

Error in na.fail.default(list(as.factor(outcome) = c(2L, 2L, 1L, : missing values in object

The na.roughfix imputation should have taken care of this (I have gotten it to work before and research on here shows that it should work) , right? Any suggestions?

Community
  • 1
  • 1
bencrosier
  • 115
  • 6

1 Answers1

5

Your lapply line didn't do what you expected it to. The result is no longer a data frame, just a list. As a result, the data.frame method of na.roughfix isn't dispatched, just the default method which just returns it's first argument if it isn't atomic (which your list clearly isn't).

The somewhat sneaky way to convert each column to numeric but retain the data frame property would be:

data[] <- lapply(data,as.numeric)

Alternatively, you could simply convert it back via as.data.frame.

joran
  • 169,992
  • 32
  • 429
  • 468
  • thanks for the response. I had actually tried the `as.data.frame` solution before. I retried it and gave `data[] <- lapply(data,as.numeric)` a run too, and both still spit up the same error. – bencrosier Aug 26 '15 at 15:14
  • @bencrosier Well, then if you want more specific help you'll have to provide a reproducible example. – joran Aug 26 '15 at 15:16