I have two datasets, one with 488,286 rows and longitude and latitude coordinates and a second with 245,077 rows and longitude and latitiude coordinates. The second also has additional data relating for the coordinates. I want to find the closest points in the second dataset to all of those in the first. I cannot share the raw data, so for the sake of simplicity I will generate some random points here:
df1<-cbind(runif(488286,min=-180, max=-120), runif(488286, min=50, max=85))
df2<-cbind(runif(245077,min=-180, max=-120), runif(245077, min=50, max=85))
I tried just using the distm function but the data was too large, so I then tried to break it down like this:
library(geosphere)
closest<-apply(df1, 1, function(x){
mat<-distm(x, df2, fun=distVincentyEllipsoid)
return(which.min(mat))
})
I think this works but it takes so long to run that I haven't actually seen the results (only tried with a subset of the data). I really need a quicker way of doing this as I left it running for 2 days and it did not finish. It doesn't have to be using distm, just anything that is quicker and accurate.
Thanks in advance!