How to properly crop() raster data extent in R

Question

I'm trying to crop some raster data and do some calculations (getting the mean sea surface temperature, specifically).

However, when comparing cropping the extent of the raster data before doing the calculations gives me the same result as doing the calculations before cropping the resulting data.

The original extent of the raster data is -180, 180, -90, 90 (xmin, xmax, ymin, ymax), and I need to crop it to any desired region defined by latitude and longitude coordinates.

This is the script I'm doing tests with:

library(raster) # Crop raster data
library(stringr)

# hadsstR functions ----------------------------------------
load_hadsst <- function(file = "./HadISST_sst.nc") {
    b <- brick(file)
    NAvalue(b) <- -32768 # Land
    return(b)
}

# Transform basin coordinates into numbers
morph_coords <- function(coords){
    coords[1] = ifelse(str_extract(coords[1], "[A-Z]") == "W", - as.numeric(str_extract(coords[1], "[^A-Z]+")),
                                         as.numeric(str_extract(coords[1], "[^A-Z]+")) )
    coords[2] = ifelse(str_extract(coords[2], "[A-Z]") == "W", - as.numeric(str_extract(coords[2], "[^A-Z]+")),
                                         as.numeric(str_extract(coords[2], "[^A-Z]+")) )
    coords[3] = ifelse(str_extract(coords[3], "[A-Z]") == "S", - as.numeric(str_extract(coords[3], "[^A-Z]+")),
                                         as.numeric(str_extract(coords[3], "[^A-Z]+")) )
    coords[4] = ifelse(str_extract(coords[4], "[A-Z]") == "S", - as.numeric(str_extract(coords[2], "[^A-Z]+")),
                                         as.numeric(str_extract(coords[4], "[^A-Z]+")) )
    return(coords)
}

# Comparison test ------------------------------------------
hadsst.raster <- load_hadsst(file = "~/Hadley/HadISST_sst.nc")

x <- hadsst.raster
nms <- names(x)
months <- c("01","02","03","04","05","06","07","08","09","10","11","12")
coords <- c("85E", "90E", "5N", "10N")
coords <- morph_coords(coords)
years = 1970:1974
range = 5:12

# Crop before calculating mean
x <- crop(x, extent(as.numeric(coords[1]), as.numeric(coords[2]),
                                        as.numeric(coords[3]), as.numeric(coords[4])))
xMeans <- vector(length = length(years)-1,mode='list')
for (ix in seq_along(years[1:length(years)])){
    xMeans[[ix]] <- mean(x[[c(sapply(range,function(x) grep(paste0(years[ix],'.',months[x]),nms)))]], na.rm = T)
}
mean.brick1 <- do.call(brick,xMeans)

# Calculate mean before cropping
x <- hadsst.raster
xMeans <- vector(length = length(years)-1,mode='list')
for (ix in seq_along(years[1:length(years)])){
    xMeans[[ix]] <- mean(x[[c(sapply(range,function(x) grep(paste0(years[ix],'.',months[x]),nms)))]], na.rm = T)
}
mean.brick2 <- do.call(brick,xMeans)
mean.brick2 <- crop(mean.brick2, extent(as.numeric(coords[1]), as.numeric(coords[2]),
                                                                                as.numeric(coords[3]), as.numeric(coords[4])))

# Compare the two rasters
mean.brick1 - mean.brick2

This is the output of mean.brick1 - mean.brick2:

class       : RasterBrick 
dimensions  : 5, 5, 25, 5  (nrow, ncol, ncell, nlayers)
resolution  : 1, 1  (x, y)
extent      : 85, 90, 5, 10  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 
data source : in memory
names       : layer.1, layer.2, layer.3, layer.4, layer.5 
min values  :       0,       0,       0,       0,       0 
max values  :       0,       0,       0,       0,       0

As you can see, both RasterBricks are exactly the same, which should be impossible for any arbitrary choice of coordinates, as exemplified below with a small matrix:

Is there something I'm doing wrong? Cropping the data before doing calculations with them should unequivocally give me different results.

I'm not sure if I understand your goal/question. If you crop your raster, you're only reducing the extent. So it should not really make a difference (for the values) if you first calculate the mean for all pixels and then discard some by cropping the raster, or if you crop first and the calculate the mean. The difference would be purely the efficiency of the operation — Val, May 07 '17 at 09:16
of course, that is if you want to calculate the mean over the layers of the brick (not the pixels within a layer) which I think is what you want — Val, May 07 '17 at 11:21
The goal would be to calculate the mean only to the cropped area, much in the same way that the mean is only calculated using the desired range of months. — Alfredo Hernández, May 07 '17 at 11:44
But you want a raster layer per averaged range of months, right? So you need to make a distinction between *temporal* and *spatial* averaging. With cropping you merely subset your area of interest, it's not a spatial averaging procedure — Val, May 07 '17 at 11:58
I thought that by selecting the spatial subset as the new extent of the raster, the temporal average would be done using only the pixels in the subset. — Alfredo Hernández, May 07 '17 at 12:01
That is technically true, but the temporal averaging is done over the third (time) dimension for each pixel independently. So the spatial subset means that you just have *less* pixels which the temporal averaging needs to be done on, potentially giving you a computational benefit. Therefore it's best practice to select an area of interest before you start the processing to avoid doing costly computation which you would't use in the end — Val, May 07 '17 at 12:04
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/143602/discussion-between-val-and-alfredo-hernandez). — Val, May 07 '17 at 12:14

score 3 · Accepted Answer · edited May 23 '17 at 11:33

Ok, I'll continue from my post in your previous question:

We start out with the full hadsst.raster brick (which for having a reproducible example, can be fake created with the first part of my solution in my previous answer).

So this dataset has the dimensions 180, 360, 516, meaning 180 rows, 360 columns and 516 temporal layers.

Technically, a raster being a matrix, this could be how it looks like:

Just a bunch of matrix layers (516 to be precise), where each pixel is exactly aligned. Here I only have three example layers, the rest is indicated by the three dots.

So if we do temporal averaging, we basically extract all the values for a single pixel and take the mean (or any other averaging operation) of them. This is indicated here by the red squares.

This also shows why cropping does not influence the temporal averaging:

If we say the orange square is our extent of interest and we perform the cropping operation before the averaging, we basically discard all values around this square. After that we again take all the values for each pixel over all layers and perform our average.

This should make now clear, why it doesn't matter when you discard the pixel around the orange square. You could also calculate the average for them and discard the values afterwards, leaving you with just the values of your orange square. It just doesn't make any real sense if you're already sure you won't need them for further calculations. Regardless, the values inside the square won't be affected.

When we talk about spatial averaging, it generally means averaging over pixel within a single layer, in this case probably over the values inside the orange rectangle.

Two common operations for that are

focal averaging (also known as neighbourhood averaging)
aggregation

The focal averaging will take will take for each pixel the average of all values of a defined number of adjacent pixels (most common is a 3x3square, where the pixel to be defined is the central one).

The aggregation is literally taking a number of pixel and combining them into a bigger pixel. This means that not only the value of this pixel will be averaged, but also that the resulting raster will have less individual pixels and a coarser resolution.

Alright, coming to the actual solution for you:

I assume you have an area of interest defined by an extent aoi:

aoi <- extent(xmin,xmax,ymin,ymax)

The first thing you would do is crop the initial brick to reduce the computational burden:

hadsst.raster_crp <- crop(hadsst.raster,aoi)

The next step is the temporal averaging, where we use the function I've defined in the solution from my other post:

hadsst.raster_crp_avg <- hadSSTmean(hadsst.raster_crp, 1969:2011, first.range = 11:12, second.range = 1:4)

Alright, now you have your temporal averages just for your region of interest. The next step depends on what your ultimate goal is. As far as I understood, you just need a single average per temporal average for your region of interest.

If that is the case, it might be the right time to leave the actual raster domain and continue with base R:

res <- lapply(1:nlayers(hadsst.raster_crp_avg),function(ix) mean(as.matrix(hadsst.raster_crp_avg[[ix]])))

This will give you a list with as many elements as your brick hadsst.raster_crp_avg has.

Using lapply, we iterate through the layers, converting each layer into a matrix and then calculating the mean over all elements leaving us with a single value per averaged-timestep for the entire area of interest.

Going further you can use unlistto convert it to a vector and the add it to a data.frame or perform any other operation you like.

Hopefully that was clear and this is what you were looking for.

Best

Happy it helped, please don't forget to accept the solution ;) — Val, May 07 '17 at 14:08
I'd like to point out that given the structure of my data, `na.rm = T` is necessary both in the `hadSSTmean()` and when doing `res <- lapply( ..., mean(... , na.rm = T))`. — Alfredo Hernández, May 07 '17 at 18:08

How to properly crop() raster data extent in R

1 Answers1