0

I have 3 shape files.

  1. The first (shp_family): Having a family_code column and geometry column (POINT) being the location of that family. For the example i have 100 observations of unique family_code and theyr location (POINT)
  2. The second (shp_school): Having a school_name column and geometry column (POINT) being the location of that school. For the example i have 50 observations of unique school_name and theyr location (POINT).
  3. The third (shp_city): Having a census_track_code column and a geometry column (POLYGON).

I want to check for each family_code what is the GPS distance (in car for example) to each school location and then determine if that family_code is inside a treshold limit of 1.5km (1 is inside range and 0 if not).

The main problem is: I manage to do this using osrmTable and getting the matrix of distances, but i understood that each family_code made a API request to each school location. So, 1 family_code searches 50 distances. In this example i would have 5000 requests. (I don't now if i am right here).

With that, in my original dataframes i have in the shp_family 70.000 observation of unique families and theyr location, the shp_school have 200 locations. The osrmTable donsen't work because i think i am making more then 14.000.000 requests.

There is a way out of this? The code i used and work with a small sample is down here. The test that i made work with 50 observation of bouth shp_family and shp_school. In matter of shp_city is that every location (POINT) of shp_family and shp_school are inside the city range.


library(tidyverse) library(osrm) library(sf)

Distance matrix

d_matrix <- osrmTable(shp_family, shp_school, measure = "distance", osrm.profile = "car")

Taking the distances from the matrix

distance <- d_matrix$distances

Treshold 1.5km

treshold <- 1500

Checking if family is inside treshold

shp_family$inside_limit<- apply(distance, 1, function(x) any(x <= treshold))

Binary transformation

shp_family <- shp_family %>% mutate(inside_limit= ifelse(inside_limit == "TRUE", 1, 0)) # 1 inside limit, 0 outside limit


I Think that shoud be a way to each family_code make the distance only for a few schools (Mainly thoes that are closer, and not all). But i think i still would have a limit API problem.

My idea was to restrict the search before doing it, but i dont know how.

I already use buffers around the schools, but the main idea was to stick with GPS distance.

Main Idea

Pedro TTL
  • 1
  • 1
  • 1
    Set up your own `osrm` server instance and increase the `max-table-size` parameter. See [here](https://stackoverflow.com/questions/75791459/r-calculating-the-distance-between-two-geographical-points/75859028#75859028) for how to set up the instance and [here](https://stackoverflow.com/questions/74997208/how-to-speed-up-code-that-finds-shortest-driving-time-between-two-sets-of-points/74998234#74998234) for how to increase the max table size. – SamR Jul 14 '23 at 18:15
  • 1
    You only need to check if the family is within distance of **any** school? I'd go with 1.5km isodistance polygons for each school, union, check which family locations intersect. Or create a list of distance matrices, one for each school and only to families within 1.5km circular buffer, and flag families already within driving distance of any school to exclude those from other dist.matrices. `osrmTable(shp_family, shp_school)` sounds like a worst case cross-join type of thing, with some optimization i'd expect to end up with 70k..100k `osrm` requests. Or approach the problem with `sfnetworks` – margusl Jul 14 '23 at 19:33

0 Answers0