5

I am working with a huge amount of data in daily tif files (thousands of daily files). I am analyzing the mean of the raster in a shapefile area, repeated over potentially thousands of shape layers.

My current .tif files are for an entire country when in reality I only need a small area of the of each tif file for each shapefile layer. Each shape layer has a different set of the days to extract from, i.e. data frame like this:

df <- data.frame(shplayer=c("shp_layerbuffer1","shp_layerbuffer2", "shp_layerbuffer3"), start=c("2000_02_18", "2004_03_19", "2010_08_20"), end=c("2010_01_09", "2005_03_15", "2012_09_04"))

Is there a way, in R, to crop a .tif (or any raster type file) BEFORE reading the file? This way I could read just the cropped area of each of the tif files

I envisoned something like (repeating across the entire set of dates):

library(sf)
library(raster)
shp_layerbuffer1 <- st_read("myshpfolder", layer="shp_layerbuffer1", quiet=T)

###EXAMPLE BUT doesn't work to crop the raster as it comes in
tempraster <- raster::raster("mytif_2000_02_18.tif", ext=extent(shp_layerbuffer1))

Then the usual velox (or raster) extract, repeat.

loki
  • 9,816
  • 7
  • 56
  • 82
Neal Barsch
  • 2,810
  • 2
  • 13
  • 39

2 Answers2

5

In such a case, I would create a virtual raster file (VRT) with the gdalUtils package. It is the quickest way to create a subset (or even virtual mosaic) of one or multiple raster data sets. For this VRT you can set the desired extent using the sf::st_bbox.

Minimal (not reproducible) example would be:

library(gdalUitls)
library(raster)
library(sf)

shp_layerbuffer1 <- st_read("myshpfolder", layer="shp_layerbuffer1", quiet=T)

gdalbuildvrt(gdalfile = "yourraster.tif", 
             output.vrt = "tmp.vrt", 
             te = st_bbox(shp_layerbuffer1))

tempraster <- raster("tmp.vrt")

The VRT basically creates a virtual canvas for a new "raster" file. It then references the corresponding rows and columns from the (one or multiple) underlying raster file(s). Hence, you do not need to create a whole new raster but simply reference existing files. The file size of your VRT should not exceed some kBytes.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
loki
  • 9,816
  • 7
  • 56
  • 82
  • Hi Loki, Thanks so much for this! I keep getting ```Error in paste(parameter_values[[which(names(parameter_values) == X)]], : non-string argument to internal 'paste'``` any ideas why? – Neal Barsch Jun 14 '18 at 07:39
  • So check this out: if you crop once in raster it goes way faster, but not on the file load. Also thanks for your work on this! – Neal Barsch Jun 15 '18 at 22:58
  • ```test7 <- system.time(mclapply(seq_along(x1), function(x){ ras<-velox("testras_so.tif");ras$crop(poly);ras$extract(poly, fun=function(t) mean(t,na.rm=T)) })) test8 <- system.time(mclapply(seq_along(x1), function(x){ ras<-velox("testras_so.tif");ras$crop(st_buffer(st_sfc(st_point(c(x1[x],y1[x]), dim="XY"),crs=32651),200000));ras$extract(poly, fun=function(t) mean(t,na.rm=T)) })) > test7 user system elapsed 0.011 0.016 4.450 > test8 user system elapsed 0.006 0.012 4.333 ``` – Neal Barsch Jun 15 '18 at 22:59
4

If you're interested in the data rather than a raster object (or vrt/tiff), then this could be an approach:

If you "load" a raster from file, it doesn't necessarily mean that it gets loaded into memory, as the documentation of raster explains:

In many cases, e.g. when a RasterLayer is created from a file, it does (initially) not contain any cell (pixel) values in (RAM) memory, it only has the parameters that describe the RasterLayer. You can access cell-values with getValues, extract and related functions. You can assign new values with setValues and with replacement.

So you could first "index" the raster and then use getValuesBlock to read the values which fall into your area of interest.

library(raster)

f <- system.file('external/test.grd',package = 'raster')

# index
r <- raster(f)

# check if in memory

inMemory(r)
#[1] FALSE # output

# this would be an extent from your overlapping shapefile
e <- extent(r,58,68,40,50)

# get cells from extent; either use cells as index directly or convert to rowcol

rowcol <- rowColFromCell(r,cellsFromExtent(r,e))

v <- getValuesBlock(r,row=rowcol[1,1],nrows=(rowcol[nrow(rowcol),1] - rowcol[1,1]),
                    col=rowcol[1,2],ncols=(rowcol[nrow(rowcol),2] - rowcol[1,2]))

v
# [1] 295.392 225.710 258.595 310.336 324.666 322.702 307.217 283.611 263.987 156.609 322.978 297.565 301.797 315.971
# [15] 323.605 326.920 317.894 297.138 270.027 225.769 337.503 323.388 314.900 308.877 314.556 343.555 344.035 315.400
# [29] 291.188 270.876 337.866 324.632 307.220 278.294 264.379 348.519 356.456 322.450 301.790 285.815 331.383 318.950
# [43] 299.246 262.026 230.869 294.012 320.274 312.777 300.513 285.317 321.075 311.447 296.952 278.519 270.279 283.797
# [57] 296.415 298.861 293.150 277.573 306.692 300.772 289.376 273.141 258.457 258.638 272.966 283.977 284.621 271.690
# [71] 286.749 286.617 275.618 247.307 198.884 193.865 240.465 268.687 277.303 271.431 260.336 271.458 263.977 231.071
# [85] 161.407 154.181 222.417 258.672 271.711 272.642 235.458 258.810 257.763 239.553 209.409 205.905 234.162 255.668
# [99] 266.260 270.532

EDIT:

# check if raster was loaded after getValuesBlock
inMemory(r)
#[1] FALSE
Val
  • 6,585
  • 5
  • 22
  • 52
  • @loki No I don't believe so. Calling `raster` doesn't load the data, just metadata. Only `getValues` or the like will load data into memory ... this is where the subset comes in. – Val Jun 14 '18 at 07:42
  • No worries ... Personally I really like `vrt` ... could be just a bit overkill for this purpose - but really comes down to personal taste IMO :) – Val Jun 14 '18 at 07:46
  • You are absolutely right. I learned to love vrts and use them quite a lot lately – loki Jun 14 '18 at 13:42
  • I'm not sure vrts are any faster than reading straight tifs into velox, at least for my purposes of raster extraction. See my tests here:https://stackoverflow.com/questions/50870080/fastest-way-to-extract-a-raster-in-r-improve-the-time-of-my-reproducible-code – Neal Barsch Jun 15 '18 at 06:54
  • 1
    @NealBarsch Your original question was how to partially read files (aka crop extents) into memory. Both solutions answered it, and if you liked loki 's answer, I don't see a reason un-accepting it. The performance aspect seems to be covered in your subsequent [question](https://stackoverflow.com/questions/50870080/fastest-way-to-extract-a-raster-in-r-improve-the-time-of-my-reproducible-code) – Val Jun 15 '18 at 07:02
  • I was mostly trying to stimulate some more debate on what is fastest :) – Neal Barsch Jun 15 '18 at 07:05