I have a set of coordinates of the locations of different individuals, and another set of coordinates of different drop off boxes, for their ballots. I'm trying to find the distance between their residence, and the nearest dropbox. I've attached a copy of the code I have to work through that as of now--it was replicated from another stack overflow example. However, it is not too efficient, as the dataset I'm working with is millions of rows, and the code relies on finding all possible combinations of coordinates, and then pulling the least distance. Is there a more efficient way to deal with this?
What I currently have:
# Made-Up Data
library(geosphere)
library(tidyverse)
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26,78901))
# Code to find the distance between voters, and the dropoff boxes
# Order into a newdf as needed first.
# First, the voters:
voter_addresses <- data.frame(voter_id = as.character(geo_voters$voter_id),
lon_address = geo_voters$long,
lat_address = geo_voters$lat
)
# Second, the polling locations:
polling_address <- data.frame(place_number = 1:nrow(geo_dropoff_boxes),
lon_place = geo_dropoff_boxes$long,
lat_place = geo_dropoff_boxes$lat
)
# Create nested dfs:
voter_nest <- nest(voter_addresses, -voter_id, .key = 'voter_coords')
polling_nest <- nest(polling_address, -place_number, .key = 'polling_coords')
# Combine for combinations:
data_master <- crossing(voter_nest, polling_nest)
# Calculate shortest distance:
shortest_dist <- data_master %>%
mutate(dist = map2_dbl(voter_coords, polling_coords, distm)) %>%
group_by(voter_id) %>%
filter(dist == min(dist)) %>%
mutate(dist_km = dist/1000,
voter_id = as.character(voter_id)) %>%
select(voter_id, dist_km)