sp::over(). Does the dot belong to one of the polygons identified with an OGRGeoJSON file?

Question

I'm trying to get a boolleans vector, where for example, v[i] =1 tells me if an i-th point (latitude longitude pair, present inside a train dataframe) falls within one of the geographical areas identified by an OGRGeoJSON file.

The OGR file is structured roughly like this:

District 1: 24 polygonal
District 2: 4 polygonal
District 3: 27 polygonal
District 4: 18 polygonal
District 5: 34 polygonal

That's what I tried to do.

However, the results obtained are not correct because the polygonal that is generated is a mix of all the various areas present in the OGR file.

library(rgdal)
library(httr)
library(sp)

r <- GET('https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON')
nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)

#New York City polygonal
pol_lat <- c(nyc_neighborhoods_df$lat)
pol_long <- c(nyc_neighborhoods_df$long)
xy <- cbind(pol_lat, pol_long)
p = Polygon(xy)
ps = Polygons(list(p),1)
pol = SpatialPolygons(list(ps))

#Points to analyse (pair of coordinates)
ny_lat <- c(train$pickup_latitude, train$dropoff_latitude)
ny_long <- c(train$pickup_longitude, train$dropoff_longitude)
ny_coord <- cbind(ny_lat, ny_long)
pts <- SpatialPoints(ny_coord)

#Query: Does the point to analyze fall in or out NYC?
over(pts, pol, returnList = TRUE)

How can I fix this to get the correct result?

@dww, I think that `nyc_neighborhoods_df` is `nyc_neighborhoods`, and they just renamed objects at some point and didn't finish the job. @pofferbacco, can you provide the output for `head(train)`? Or describe where we can find this data? — AColeman, Feb 23 '20 at 17:56

AColeman · Accepted Answer · 2020-02-23T18:07:47.467

sp is an older package which is being phased out in favor of the newer "Simple Features" sf package. Let me know if you are open to using the pipe operator %>% from the magrittr package, as it works nicely with the sf package (as does dplyr and purrr).

Using sf, you could do:

library(sf)

# Replace this with the path to the geojson file
geojson_path <- "path/to/file.geojson"

boroughs <- sf::st_read(dsn = geojson_path, stringsAsFactors = FALSE)

Now making a very simple spatial point object to stand in for the "trains" data.

# Make test data.frame

test_df <- 
  data.frame(
  # Random test point I chose, a couple of blocks from Central Park
      a = "manhattan_point", 
      y = 40.771959, 
      x = -73.964128, 
      stringsAsFactors = FALSE)

# Turn the test_df into a spatial object
test_point <-
  sf::st_as_sf(
    test_df,
    # The coords argument tells the st_as_sf function
    # what columns store the longitude and latitude data
    # which it uses to associate a spatial point to each
    # row in the data.frame
    coords = c("x", "y"), 
    crs = 4326 # WGS84
    )

Now we are ready to determine what polygon(s) our point falls in:

# Get the sparse binary predicate. This will give a list with as 
# many elements as there are spatial objects in the first argument, 
# in this case, test_point, which has 1 element.
# It also has attributes which detail what the relationship is
# (intersection, in our case) 
sparse_bin_pred <- sf::st_intersects(test_point, boroughs)

# Output the boro_name that matched. I think the package purrr
# offers some more intuitive ways to do this, but
lapply(
  sparse_bin_pred, 
  function(x) boroughs$boro_name[x]
  )

That last part outputs:

[[1]]
[1] "Manhattan"

from memory `sf::st_join(..., left = FALSE )` will give you a direct inner-join of points in the polygons, so no need for the `lapply` at the end. — SymbolixAU, Feb 23 '20 at 21:04
it worked great! I just started with R, and I didn't know the sf library. Definitely much faster than sp. Thanks for the help, I'd offer you a beer more than gladly! — pofferbacco, Feb 24 '20 at 22:02

sp::over(). Does the dot belong to one of the polygons identified with an OGRGeoJSON file?

1 Answers1