I have two dataframes that have three variables each: location_id
, latitude
and longitude
. For every location_id
in the first data frame, I have to find the closest location_id
in the second dataframe, in addition to the distance between the location_id
from each df.
I've tried using expand.grid
to give me every possible combination of the two data frames together (worked), but then when I tried to merge the latitude and longitudes from the original lists onto my super list, I ran out of memory (there are 7000 location_ids in the first dataframe and 5000 location_ids
in the second data frame).
I was able to get the equation to calculate the distance between two points from elsewhere on stack overflow:
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}
but I'm having a hard time applying it in the context of this problem. Any help is appreciated!
EDIT:
The sets of data look exactly like this:
location_id LATITUDE LONGITUDE
211099 32.40913 -99.78064
333547 32.45192 -100.39325
369561 32.47458 -99.69176
123141 33.68169 -96.60887
386913 33.99921 -96.40743
123331 31.96173 -83.75830