2

I'm trying to find the distance between multiple cities using the distHaversine function in the geosphere package. This code requires a variety of arguments:

The longitude and latitude of the first place. The longitude and latitude of the second place. The radius of the earth in whatever unit (I'm using r = 3961 for miles).

When I input this as a vector, it works easily:

HongKong <- c(114.17, 22.31)
GrandCanyon <- c(-112.11, 36.11)

library(geosphere)
distHaversine(HongKong, GrandCanyon, r=3961)
#[1] 7399.113 distance in miles

However, my actual datasets look like this:

library(dplyr)
location1 <- tibble(person = c("Sally", "Jane", "Lisa"),
current_loc = c("Bogota Colombia", "Paris France", "Hong Kong China"),
lon = c(-74.072, 2.352, 114.169),
lat = c(4.710, 48.857, 22.319))

location2 <- tibble(destination = c("Atlanta United States", "Rome Italy", "Bangkok Thailand", "Grand Canyon United States"),
              lon = c(-84.388, 12.496, 100.501, -112.113),
              lat = c(33.748, 41.903, 13.756, 36.107))

What I want is for there to be rows that say how far each destination is from the person's current location.

I know there has to be a way using purrr's pmap_dbl(), but I'm unable to figure it out.

Bonus points if your code uses the tidyverse and if there's any easy way to make a column that identifies the closest destination. Thank you!

In an ideal world, I would get this:

solution <- tibble(person = c("Sally", "Jane", "Lisa"),
                    current_loc = c("Bogota Colombia", "Paris France", "Hong Kong China"),
                    lon = c(-74.072, 2.352, 114.169),
                    lat = c(4.710, 48.857, 22.319),
                   dist_Atlanta = c(1000, 2000, 7000),
                   dist_Rome = c(2000, 500, 3000),
                   dist_Bangkok = c(7000, 5000, 1000),
                   dist_Grand = c(1500, 4000, 7500),
                   nearest = c("Atlanta United State", "Rome Italy", "Bangkok Thailand"))

Note: The numbers in the dist columns are random; however, they would be the output from the distHaversine() function. The name of those columns is arbitrary--it does not need to be called that. Also, if the nearest column is out of the scope of this question, I think that I can figure that one out.

J.Sabree
  • 2,280
  • 19
  • 48

2 Answers2

2

distHaversine accepts only one pair of lat and lon values at a time so we need to send all combinations of location1 and location2 rows one by one to the function. One way using sapply would be

library(geosphere)


location1[paste0("dist_", stringr::word(location2$destination))] <- 
        t(sapply(seq_len(nrow(location1)), function(i) 
            sapply(seq_len(nrow(location2)), function(j) {
   distHaversine(location1[i, c("lon", "lat")], location2[j, c("lon", "lat")], r=3961)
})))

location1$nearest <- location2$destination[apply(location1[5:8], 1, which.min)]

location1

# A tibble: 3 x 9
#  person current_loc         lon   lat dist_Atlanta dist_Rome dist_Bangkok dist_Grand nearest              
#  <chr>  <chr>             <dbl> <dbl>        <dbl>     <dbl>        <dbl>      <dbl> <chr>                
#1 Sally  Bogota Colombia  -74.1   4.71        2114.     5828.       11114.      3246. Atlanta United States
#2 Jane   Paris France       2.35 48.9         4375.      687.        5871.      5329. Rome Italy           
#3 Lisa   Hong Kong China  114.   22.3         8380.     5768.        1075.      7399. Bangkok Thailand  
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thank you! Quick question: For your last line when you call to location1[5:8], is there a way to make R pick up on the names instead of the number? The form that I'm using this for is variable with the number of locations that I'll be identifying each week. Regardless, thank you! – J.Sabree Jun 05 '19 at 14:10
  • 1
    @J.Sabree yes, it depends on the name you give in the previous `sapply` step. So in this case as we are giving name with prefix `"dist"` so we can do `location2$destination[apply(location1[grep("^dist", names(location1))], 1, which.min)]` – Ronak Shah Jun 05 '19 at 14:13
  • I've tried applying this code to other datasets, and it works for some but not for others. I tried applying it to another dataset, and I received this error: Error in hub_locations_list$hub_loc[apply(dataset2[grep("^dist", : invalid subscript type 'list' (Note: I confirmed that this dataset2 IS class dataframe). Is there a reason why it only works in some cases? – J.Sabree Jun 07 '19 at 19:04
  • @J.Sabree not sure but have you used the syntax correctly? `hub_locations_list$hub_loc[apply(dataset2[grep("^dist", names(dataset2))], 1, which.min)]` – Ronak Shah Jun 08 '19 at 11:45
  • yes I checked the format, and its giving an error only for some datasets. For instance, when the class function yields "spec_tbl_df", it works on the dataset, but when it is just tbl_df, tbl, and data.frame, it does not work. Do you know why that is or how to change the class of tibbles to spec_tbl_df? I've tried unlist, and it is not working. – J.Sabree Jun 12 '19 at 19:54
  • Hmmm...not sure. Maybe you can try and convert everything into dataframe then to be on the safer side. Just wrap `data.frame()` once you have read your data. – Ronak Shah Jun 13 '19 at 07:40
1

Using the tidyverse an map fuction form purrr as you asked, I found a solution, all in one pipe line.

library(tidyverse)
library(geosphere)

# renaming lon an lat variables in each df

location1 <- location1 %>%
 rename(lon.act = lon, lat.act = lat)

location2 <- location2 %>%
  rename(lon.dest = lon, lat.dest = lat)

# geting distances
merge(location1, location2, all = TRUE) %>%
  group_by(person,current_loc, destination) %>%
  nest() %>%
  mutate( act = map(data, `[`, c("lon.act", "lat.act")) %>%
            map(as.numeric),
          dest = map(data, `[`, c("lon.dest", "lat.dest")) %>%
            map(as.numeric),
          dist = map2(act, dest, ~distHaversine(.x, .y, r = 3961))) %>%
  unnest(data, dist) %>%
  group_by(person) %>%
  mutate(mindis = dist == min(dist))

Johan Rosa
  • 2,797
  • 10
  • 18
  • thanks for sending this, but I get an error: Error: Can't find columns `lon.act`, `lat.act` in `.data`. – J.Sabree Jun 05 '19 at 14:18
  • Sorry, now it works. I just chage the names of the `lon` and `lat` variables in each df before merging the datasets. – Johan Rosa Jun 05 '19 at 14:26