I'm having trouble trying to figure out the following: I am running Random Forest for classification of habitat use and have GPS data from 17 animals. My data frame depicts different habitat variables such as aspect and canopy cover at each used animal location and each unused, random location. Each used location is also identified by the ID number of the animal ( this column is called "lynx"). A column called "usvsa" codes used locations as 1 and unused locations as 0. Here's the top of my spatial points data frame called sdata3:
lynx usvsa aspect canopy_cover clearcut_area cti deciduous dist_draw dist_ridge
311 1 252.3302 55.3704 0 7.311823 0 90.0000 484.66483
311 1 263.1394 55.1528 0 6.857203 0 324.4996 305.94116
311 1 249.6992 72.9272 0 6.612025 0 364.9658 212.13203
311 1 194.4459 50.4428 0 6.330615 0 108.1665 67.08204
Ok. So, I'd like to use Jackknifing to run Random Forest 17 times (since I have 17 individuals), leaving one animal out each run. The idea is to compare the results of each random forest run to make sure no one animal is having a disproportionately large effect on the model results. I've been reading about package "bootstrap" and the jackknife function: jackknife(x, theta, ...)
I get that I need to write a function for theta but I can't figure out how to put it all together so that each run of Random Forest leaves one animal out. Here is my Random Forest Model: randomForest(y ~ ., data=sdata3, ntree=b, importance=TRUE,norm.votes=TRUE, proximity=TRUE)
I'd like to compare the importance values and oob error of each run.
Any tips would be appreciated!!