1

I'm currently working with climate data from https://www.worldclim.org/data/worldclim21.html and I'm trying to obtain the max & min temperature for the 196 provinces of Peru. So far I've used this code:

my_shape <- st_read("multipolygon shapfile with 196 provinces inside Peru") 
r <- raster("raster file of temp with 1km2 grid")
r2 <- crop(r, extent(my_shape))

So far this gives me the shape of Peru. From this I would like to obtain the average (max & min) temperature for each of the 196 provinces and end up with a dataframe of 196 observations. Is there an efficient way of doing so?

Alfonso R.
  • 11
  • 1
  • Hi Alfonso! Please provide a reproducible example: so that the person who tries to help you can copy-paste your code, run it and be able to reproduce your steps. It will also be helpful to include pictures along with your explanations, to make it a bit more clearer about your issue – yuliaUU Apr 02 '21 at 18:20

2 Answers2

0

You can use zonal for that. Something along these lines

library(raster)
my_shape <- shapefile("multipolygon shapfile with 196 provinces inside Peru") 
r <- raster("raster file of temp with 1km2 grid")
r2 <- crop(r, my_shape)
zones <- rasterize(my_shape, r2)
zonal(r2, zones)
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
0

You can use raster::extract() to summarize raster data based on your polygons.
I often find it necessary to specify the package for extract to be sure you are grabbing the right version.

library(sp) # I'm assuming that's the source for st_read
library(raster)  

my_shape <- st_read("multipolygon shapfile with 196 provinces inside Peru") 
r <- raster("raster file of temp with 1km2 grid")
## not necessary to crop source raster

my_shape$mean <- raster::extract(r, my_shape, fun = mean, na.rm = TRUE)  

This should add an attribute column to your province polygon layer. You can do the same with fun = min and fun = max.

If your raster layer is large, or you want to do this several times you will probably get better performance using alternative packages like velox.

library(velox)

vs <- velox("raster file of temp with 1km2 grid")
my_shape$mean2 <- vs$extract(my_shape, fun = function(x){mean(x, na.rm= TRUE)})

The velox approach is much faster (see below)

Another alternative is to use the stars package, and use aggregate() to summarize the min and max data. see How to extract values from a raster using polygons with the R stars package?

library(stars)
library(microbenchmark)

rstar = read_stars(raster_file)
st_crs(rstar) <- st_crs(my_shape) 

f1 = function(){
      my_shape$mean = raster::extract(r, my_shape, fun = mean, na.rm = TRUE)
}
f2 = function(){
      vs <- velox(raster_file)
      my_shape$mean2 <- vs$extract(my_shape, fun = function(x){mean(x, na.rm= TRUE)})
}

f3 = function(){
      zones = rasterize(my_shape, r)
      raster::zonal(r, zones)
}

f4 = function(){
x = aggregate(rstar, my_shape, mean, na.rm = T)
st_as_sf(x)
}

res = microbenchmark(f1(), f2(), f3(), f4(), times = 3)
res
# Unit: seconds
# expr        min         lq       mean     median       uq        max neval
# f1() 172.283024 174.644299 178.920625 177.005575 182.2394 187.473276     3
# f2()   3.339221   3.374647   3.396874   3.410072   3.4257   3.441327     3
# f3() 170.981220 179.481681 182.520730 187.982142 188.2905 188.598829     3
# f4() 167.076081 173.427232 179.332300 179.778383 185.4604 191.142437     3

velox$extract is the clear performance winner. The other three versions seem very similar.

The processing time for zonal is mostly in rasterizing the polygon layer, it would beat out the extract and aggregate if you are using for multiple metrics. Reading the raster from the file is included in the velox expression, but not the other versions. enter image description here

Brian Fisher
  • 1,305
  • 7
  • 17