I have two sets of coordinates and trying to find closest matches of coordinates. Given that one dataset consists of 1 million records and the other is nearly half a million records, looking for a better way to get this task done and requires suggestions.
dput of first data set is
structure(list(longitude = c(-2.5168477762, -2.5972432832, -2.5936692407,
-2.5943475677, -2.5923214528, -2.5919014869, -2.5913454553, -2.5835739992,
-2.5673150195, -2.5683356381), latitude = c(51.4844052488, 51.45278562,
51.4978889752, 51.4979844501, 51.4983813479, 51.4982126232, 51.4964350456,
51.4123728037, 51.4266239227, 51.4265740193)), .Names = c("longitude",
"latitude"), row.names = c(NA, 10L), class = "data.frame")
dput of second data set is
structure(list(longitude = c(-3.4385392589, -3.4690321528, -3.2723981534,
-3.3684012246, -3.329625956, -3.3093349806, 0.8718409198, 0.8718563602,
0.8643998472, 0.8644153057), latitude = c(51.1931124311, 51.206897181,
51.1271423704, 51.1618047221, 51.1805971356, 51.1663567178, 52.896084336,
52.896092955, 52.9496082626, 52.9496168824)), .Names = c("longitude",
"latitude"), row.names = 426608:426617, class = "data.frame")
I have looked at approx and findInterval functions in R but did not understand them fully as to how they work. What I am trying to do is take coordinates from dataset1 and match them to all the coordinates in dataset2 to find the closest match. Currently I am using two forloops but it takes forever due to the size of the data.
The code I have tried is given below:
cns <- function(x,y)
{
a = NULL
b = NULL
for(i=1:nrow(x))
{
for(j=1:nrow(y))
{
a[j] = distm(c(x$longitude[i],x$latitude[i]),
c(y$longitude[j],y$latitude[j]),
fun = distVincentyEllipsoid)
}
b[i] = which(a == min(a))
}
return(y[b,])
}
The above functions takes one point from dataset1 and calculates the distance using all the points in dataset2 then finds the minimum distance and returns the coordinates of that distance.
Looking for may be parallel processing to acheive this task in a suitable time. Any suggestions welcome.
Regards,