3

What is the quickest way of cropping (clipping) a complex SpatialPolygonsDataFrame with the @data preserved in R using another, possibly complex, SpatialPolygon? I know of two ways (shown under). The raster way is quicker for less complex SpatialPolygonsDataFrames and returns a SpatialPolygonsDataFrame as in the example. It gets slow with large SpatialPolygonsDataFrames, though. The rgeos way is quicker for large SpatialPolygonsDataFrame, but it sometimes fails with very complex geometries and does not return SpatialPolygonsDataFrames as default.

I have not paid attention to the GIS development in R lately and there might well be quicker ways of doing this by now.

The example polygons are small to honor people's computers and bandwidth. Consider the "real" polygons 50-1000 MB.

Setup:

library(rnaturalearth)
library(sp)
library(raster)
library(rgeos)

dt <- rnaturalearth::ne_countries()
clip_boundary <- sp::SpatialPolygons(list(sp::Polygons(
list(sp::Polygon(data.frame(lon = c(0, 180, 180, 0), lat = c(40, 40, 80, 80)))), ID = 1)))

The rgeos way:

system.time({
clipped.dt <- rgeos::gIntersection(dt, clip_boundary, byid = TRUE)
ids <- sapply(slot(clipped.dt, "polygons"), function(x) slot(x, "ID"))
ids <- sapply(strsplit(ids, " "), "[", 1) 

tmp.df <- data.frame(dt@data[ids,])
names(tmp.df) <- names(dt@data)

out <- sp::SpatialPolygonsDataFrame(clipped.dt, tmp.df, match.ID = FALSE)
})

#   user  system elapsed 
#  0.069   0.002   0.074 

class(out)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"

The raster way:

system.time({
  out <- raster::crop(dt, clip_boundary)
})

#   user  system elapsed 
#  0.042   0.001   0.043 

class(out)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"

A plot for a reference (not relevant for the question):

plot(out)

enter image description here

Mikko
  • 7,530
  • 8
  • 55
  • 92

1 Answers1

3

Much of R's geospatial work can now be done using the sf package.

https://r-spatial.github.io/sf/index.html

It looks like it might be a touch faster than raster and sp.

library(rnaturalearth)
library(sp)
#library(raster)
#library(rgeos)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
library(ggplot2)

# Original data 
dt <- rnaturalearth::ne_countries()
clip_boundary <- sp::SpatialPolygons(list(sp::Polygons(
  list(sp::Polygon(data.frame(lon = c(0, 180, 180, 0), lat = c(40, 40, 80, 80)))), ID = 1)))

# Original data as sf objects
dt_sf <- st_as_sf(dt)
boundary_sf <- st_as_sf(clip_boundary)

# clip 
clipped <- st_crop(dt_sf, boundary_sf)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries

#plot
ggplot(clipped) + geom_sf()


system.time({
  out <- st_crop(dt_sf, boundary_sf)
})
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
#>    user  system elapsed 
#>   0.019   0.000   0.018

system.time({
  out <- raster::crop(dt, clip_boundary)
})
#> Warning in raster::crop(dt, clip_boundary): non identical CRS
#>    user  system elapsed 
#>   0.526   0.000   0.525

system.time({
  clipped.dt <- rgeos::gIntersection(dt, clip_boundary, byid = TRUE)
  ids <- sapply(slot(clipped.dt, "polygons"), function(x) slot(x, "ID"))
  ids <- sapply(strsplit(ids, " "), "[", 1) 

  tmp.df <- data.frame(dt@data[ids,])
  names(tmp.df) <- names(dt@data)

  out <- sp::SpatialPolygonsDataFrame(clipped.dt, tmp.df, match.ID = FALSE)
})
#> Warning in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td,
#> unaryUnion_if_byid_false, : spgeom1 and spgeom2 have different proj4 strings
#>    user  system elapsed 
#>   0.045   0.000   0.045

Created on 2020-04-13 by the reprex package (v0.3.0)

mrhellmann
  • 5,069
  • 11
  • 38
  • Thanks for pointing out the `sf` package. I had not thought of it. In the application I use the cropping, the shapes must be returned in `sp` format. The conversions make the `sf` solution slightly slower than the `rgeos` one (system times for one dataset I have: `rgeos` ~ 7 sec, `sf` ~ 9 sec and `raster` > 80 sec). However, your suggestion might well be the quickest so far if one already operates using `sf` objects. I will leave the question open for a while before accepting your answer. Maybe someone comes up with a faster solution we have not thought of. – Mikko Apr 14 '20 at 06:28
  • The `sp` -> `sf` -> `sp` will slow things down. I didn't realize `sp` was a requirement in your case. R usually isn't very fast, but you can always throw a bigger computer at it. I've found the `sf` way of doing spatial operations really is much more intuitive than `sp`. Good luck. – mrhellmann Apr 14 '20 at 12:34