How to increase R processing speed dealing with large raster stacks?

Question

I'm dealing with large raster stacks and I need to re-sample and clip them. I read list of Tiff files and create stack:

files <- list.files(path=".", pattern="tif", all.files=FALSE, full.names=TRUE) 
s <- stack(files)
r <- raster("raster.tif")
s_re <- resample(s, r,method='bilinear')
e <- extent(-180, 180, -60, 90)
s_crop <- crop(s_re, e)

This process takes days to complete! However, it's much faster using ArcPy and python. My question is: why the process is so slow in R and if there is a way to speed up the process? (I used snow package for parallel processing, but that didn't help either). These are r and s layers:

> r
class       : RasterLayer 
dimensions  : 3000, 7200, 21600000  (nrow, ncol, ncell)
resolution  : 0.05, 0.05  (x, y)
extent      : -180, 180, -59.99999, 90.00001  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 

> s
class       : RasterStack 
dimensions  : 2160, 4320, 9331200, 365  (nrow, ncol, ncell, nlayers)
resolution  : 0.08333333, 0.08333333  (x, y)
extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0

Hard to comment on what might be causing this without having your raster files in hand. That said, I've occasionally sped up raster operations many-fold by using the GDAL functions wrapped by [**gdalUtils**](https://cran.r-project.org/web/packages/gdalUtils/index.html). Here, I'd probably use [`gdal_translate()`](http://www.gdal.org/gdal_translate.html), setting resolution via its `-tr` argument and desired resampling algorithm via its `-r` argument. Not sure how it handles `RasterStack`s, but it should deal with `RasterLayer`s (or, I'm guessing, `*.tif*` files on disk) just fine. — Josh O'Brien, Nov 09 '15 at 20:48
Make sure you use the latest version of raster as the speed of `resample` has recently been improved a lot (perhaps not enough). I would also be helpful if you show show(s) and r so that we can see what is going on. For multicore, you could try beginCluster etc — Robert Hijmans, Nov 10 '15 at 00:48
@JoshO'Brien Thanks, I'll try gdal, but raster would be more convenient. — Geo-sp, Nov 10 '15 at 14:47
@RobertH I'm using raster version 2.4-20. I added the information for r and s layers. Thanks! — Geo-sp, Nov 10 '15 at 14:48

score 5 · Accepted Answer · edited Nov 11 '15 at 03:53

5

I second @JoshO'Brien's suggestion to use GDAL directly, and gdalUtils makes this straightforward.

Here's an example using double precision grids of the same dimensions as yours. For 10 files, it takes ~55 sec on my system. It scales linearly, so you'd be looking at about 33 minutes for 365 files.

library(gdalUtils)
library(raster)

# Create 10 rasters with random values, and write to temp files
ff <- replicate(10, tempfile(fileext='.tif'))
writeRaster(stack(replicate(10, {
  raster(matrix(runif(2160*4320), 2160), 
         xmn=-180, xmx=180, ymn=-90, ymx=90) 
})), ff, bylayer=TRUE)

# Define clipping extent and res
e <- bbox(extent(-180, 180, -60, 90))
tr <- c(0.05, 0.05)

# Use gdalwarp to resample and clip 
# Here, ff is my vector of tempfiles, but you'll use your `files` vector
# The clipped files are written to the same file names as your `files`
#  vector, but with the suffix "_clipped". Change as desired.
system.time(sapply(ff, function(f) {
  gdalwarp(f, sub('\\.tif', '_clipped.tif', f), tr=tr, r='bilinear', 
           te=c(e), multi=TRUE)
}))

##    user  system elapsed 
##    0.34    0.64   54.45

You can parallelise further with, e.g., parLapply:

library(parallel)
cl <- makeCluster(4)
clusterEvalQ(cl, library(gdalUtils))
clusterExport(cl, c('tr', 'e'))

system.time(parLapply(cl, ff, function(f) {
  gdalwarp(f, sub('\\.tif', '_clipped.tif', f), tr=tr, 
           r='bilinear', te=c(e), multi=TRUE)
}))

##    user  system elapsed 
##    0.05    0.00   31.92

stopCluster(cl)

At 32 sec for 10 grids (using 4 simultaneous processes), you're looking at about 20 minutes for 365 files. Actually, it should be faster than that, since two threads were probably doing nothing at the end (10 is not a multiple of 4).

edited Nov 11 '15 at 03:53

Josh O'Brien

159,210
26
366
455

answered Nov 11 '15 at 00:21

jbaums

27,115
5
79
119

Thanks! this works fine, but if I use that to read all the files I get memory error; while with raster you don't have that problem. do you have any suggestion? – Geo-sp Nov 12 '15 at 23:38
parLapply/sapply does that. – Geo-sp Nov 12 '15 at 23:42
@DNM - I can't imagine why... on my system, GDAL processes the grid in chunks. My RAM usage barely budges when I run either of the above. Each GDAL process seems to use about 100-200MB of RAM for me (so the parallel version with 4 processes uses maybe up to about 1GB). Do you already have very little free RAM? – jbaums Nov 12 '15 at 23:49
This is strange! right now I have 26 GB of free memory. – Geo-sp Nov 12 '15 at 23:52
`Error: cannot allocate vector of size 12.7 Gb` – Geo-sp Nov 13 '15 at 14:58
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/95095/discussion-between-jbaums-and-dnm). – jbaums Nov 14 '15 at 00:06
I've tried it & I noticed 2 things. 1. on macOS Mojave 10.14.5, I had to set a "Driver" for the function to see the terminal commands that it was sending. See the `of ="GTiff-raster-"` in `sapply(predictors, [...] multi=TRUE, of="GTiff-raster-")})`. This gives me a warning, not running the command, but I was able to copy paste the terminal commands and it worked!! `/usr/local/Cellar/osgeo-gdal/3.0.2/bin/gdalwarp -multi -te -118 -55 -34 32 -tr 0.002 0.002 -r "bilinear" -of GTiff PATHTOTHEORIGINAL.tif PATHTOTHEOUTPUT_clipped.tif` 2. It made the file 12.35 times bigger and I don't know why... – M. Beausoleil Jan 16 '20 at 14:03

score 1 · Answer 2 · answered Nov 11 '15 at 02:04

1

For comparison, this is what I get:

library(raster)
r <- raster(nrow=3000, ncol=7200, ymn=-60, ymx=90) 
s <- raster(nrow=2160, ncol=4320) 
values(s) <- 1:ncell(s)
s <- writeRaster(s, 'test.tif')

x <- system.time(resample(s, r, method='bilinear'))
#   user  system elapsed 
#  15.26    2.56   17.83

10 files takes 150 seconds. So I would expect that 365 files would take 1.5 hr --- but I did not try that.

answered Nov 11 '15 at 02:04

Robert Hijmans

40,301
4
55
63

Surprisingly its slower on my machine. `user system elapsed 20.45 5.22 25.75` – Geo-sp Nov 12 '15 at 15:23

How to increase R processing speed dealing with large raster stacks?

2 Answers2