0

I want to do Cross validation for random forest regression, but really i'm not sure about how. This is my code so far:

library(rfUtilities)
# Read Data
base <- readxl::read_xlsx(c:\ File)

# Pull columns to use in the model
base_cl <- select(base, 
                  Id = PLA_WTWPartyID, 
                  Ind =Global_reference_Industry, 
                  Num__Ind =NumInd,
                  Retention = Retention_AL,
                  Limit = Limit_AL,
                  Exposure = Exposure_AL,
                  #RL_Exposure = Risk_level_Exposure,
                  LPremium = Liab_Premuim_AL,
                  Haz_Gp = HazardGp_AL,
                  LPick =Loss_Pick_AL,
                  #RL_LPick = Level_Loss_Pick,
                  Rate = Rate_AL,
                  lob = AL_R,
                  Date = AL_R_Date) 

#Clean Data
base_cl$_Ind[is.na(base_cl$_Ind)] <- "Other"
base_cl$Limit[base_cl$Limit == "0"] <- NA
base_cl$Exposure[base_cl$Exposure == "0"] <- NA

#Remove Rate outliers
base_cl$Rate <- remove_outliers(base_cl$Rate)

base_cl <- base_cl %>%
  filter(lob == "1") %>%
  filter(Date == "1") %>%
  drop_na(Limit)%>%
  drop_na(Exposure) %>%
  drop_na(LPremium) %>%
  drop_na(Retention) %>%
  drop_na(Rate)     
output.forest <- randomForest(Formula_3, base_cl, ntree = 400, keep.forest = T,
                              importance = T, localImp = T, mtry = 6)

print(output.forest)
rf.regression.fit(output.forest)
varImpPlot(output.forest, sort = TRUE)    
RF_CV_2 <- rfcv(trainx = base_cl[, 4:9], trainy = base_cl[[10]], p = .2,
                normalize = T,bootstrap = T, trace = T,step = 3, method = "cv")

and in this last i have an error

RF <- rf.crossValidation(output.forest, base_cl, p = 0.1, n = 99, seed = NULL,
                         normalize = FALSE, bootstrap = FALSE, trace = FALSE, ntree = 400)

Error in sample.int(length(x), size, replace, prob) : object 'sample.sizes' not found

... and I don't know how can i fix this to run. Can you help me to build a function or fix my code to run cross validation,maybe with k= 5 or 10.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Carlos Tellez
  • 147
  • 1
  • 2
  • 10
  • Possible duplicate of [How to perform random forest/cross validation in R](https://stackoverflow.com/questions/19760169/how-to-perform-random-forest-cross-validation-in-r) – Boxuan May 06 '19 at 14:47
  • 3
    There' not much point in offering all that code since you have not offered the data that it's being run on. You should however include the name of the package that has the function that throws the error. – IRTFM May 06 '19 at 15:05

1 Answers1

0

Searching with Google on:

 rf.crossValidation "Error in sample.int(length(x), size, replace, prob) : object 'sample.sizes' not found" 

... we find that the bug was fixed in February but that you will need to install the development version from Github. See the bug report and response at: https://github.com/jeffreyevans/rfUtilities/issues/4

IRTFM
  • 258,963
  • 21
  • 364
  • 487