I am working on running RandomForest. I've imported point data representing used and unused sites and created a raster stack from raster GIS layers. I've created a SpatialPointDataFrame with all of my used and unused points with their underlying raster values attached.
require(sp)
require(rgdal)
require(raster)
#my raster stack
xvariables <- stack(rlist) #rlist = a list of raster layers
# Reading in the spatial used and unused points.
ldata <- readOGR(dsn=paste(path, "DATA", sep="/"), layer=used_avail)
str(Ldata@data)
#Attach raster values to point data.
v <- as.data.frame(extract(xvariables, ldata))
ldata@data = data.frame(ldata@data, v[match(rownames(ldata@data), rownames(v)),])
Next I plan to run a Random Forest using this data. The problem is, I have a very large data set (over 40,000 data points). I need to sub sample my data but I am having a really hard time figuring out how to do this. I've tried using the sample() function but I think that because I have a SpatialPointsDataFram it wont work? I'm new to R and would really appreciate any ideas.
Thanks!