Calculate number of NA and non-NA values in a raster but with new resolution and extent

Question

I need to calculate the number of NA and non-NA values in a raster with original resolution of 1 x 0.00811 but aggregated to 2 degrees and with a new extent.

The original raster (available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.052q5, see output 1) was merged with another dataset (unfortunately, not open-source) to produce a raster in Output 2:

Output 1

class      : RasterLayer 
dimensions : 19142, 35738, 684096796  (nrow, ncol, ncell)
resolution : 0.01, 0.00811  (x, y)
extent     : -178.6931, 178.6869, -65.29534, 89.94628  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : HumanFootprintWGS84.tif 
names      : HumanFootprintWGS84 
values     : 0, 50  (min, max)

Output 2

dimensions : 19142, 35738, 684096796  (nrow, ncol, ncell)
resolution : 0.01, 0.00811  (x, y)
extent     : -178.6931, 178.6869, -65.29534, 89.94628  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : r_tmp_2022-02-10_145403_42352_01781.grd 
names      : layer 
values     : 0.6730382, 1  (min, max)

The raster I used for re-sampling was a dummy and is as follows:

Output 3

class      : RasterLayer 
dimensions : 65, 180, 11700  (nrow, ncol, ncell)
resolution : 2, 2  (x, y)
extent     : -180, 180, -65, 65  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs

Unfortunately, the merged and therefore the re-sampled raster has a lot of NA values on the coastline and near the lakes; and the averaging method in st_warp converts NA values to 0s, which skews the calculation of the mean of some of the cells.

I've decided to adjust the values resulting from averaging by multiplying the averaged values by Number of Non-NA values/(Number of Non-NAs - number of NAs), i.e.:

average x {Number of non-NA values/(Number of non-NA values/ number of NA values)}

For that I need to know the number of non-NA and NA values is the original merged raster (Output 2) but at 2 degree resolution and the extent in Output 3 (-180, 180, -65, 65).

I'm new to mapping and rasters so apologies if this is a basic question.

I've tried to rasterize with coordinates of original dataset and the dummy grid, but this would not get me the number of NA values, just the number of cells (plus the merged raster is over 5.1 Gb). I've tried to trim NA values (stupid idea), and st_warp but without averaging (uses nearest neighbour anyway).

If anyone has any ideas or have got a more elegant solution, I'd really appreciate it.

Thank you sincerely,

Edit: Through trial and error, I found that it makes a difference for both re-sampling (thank you again, @Robert Hijmans) and st_warp, whether the values are loaded in the memory or not.

Here are example outputs, with and without values loaded:

terra::resample with averaging without data loaded in memory

terra::resample with averaging with data loaded in memory

perhaps you look for `freq(r, value=NA)`. Or (probably less efficient) `cellStats(is.na(r), sum)` — dww, Feb 10 '22 at 15:16
Thank you. Probably. The trick is to measure this over the merged values but with the new 2 degree resolution and and new extent. I tried aggregating but annoyingly I would need to multiply but a non-integer to get to 2 degree which I believe is not possible with aggregate. — Tania Barychka, Feb 10 '22 at 15:50
if you need to change resolution by non-integer value, see here. https://stackoverflow.com/a/37956798/2761575 — dww, Feb 10 '22 at 16:20
Please provide a reproducible dataset next time rather than pictures of the data: https://youtu.be/3EID3P1oisg — Shawn Hemelstrand, Feb 18 '22 at 07:46

Robert Hijmans · Answer 1 · 2022-02-14T17:48:14.610

0

Example data

library(terra)
#terra 1.5.20
library(geodata)
w <- world(path=".")
# input raster
x <- rast(res=3)
x <- rasterize(w, x, field=1)

# output raster
r <- rast(res=10)

Solution:

rs <- resample(x, r, "sum")

rs
#class       : SpatRaster 
#dimensions  : 18, 36, 1  (nrow, ncol, nlyr)
#resolution  : 10, 10  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : lon/lat WGS 84 
#source      : memory 
#name        :     layer 
#min value   : 0.1111111 
#max value   :  11.11111 

plot(rs)

You need terra 1.5.20 for this. That is currently the development version. The easiest way to install it on Windows or OSX is with install.packages('terra', repos='https://rspatial.r-universe.dev')

Earlier versions of terra ignore the "sum" option.

In response to your comment: I get the same results with sum and average when I use a file as data source:

xx <- writeRaster(x, "test.tif", overwrite=T)
rs <- resample(xx, r, "sum")

We can compare the result with exact extraction with polygons (that also works, but will choke R on big datasets)

p <- as.polygons(r)
e <- extract(x, p, exact=TRUE, fun=sum, na.rm=TRUE)
re <- rast(r)    
re[e[,1]] <- e[,2]

plot(rs, re, xlab="resample", ylab="extract")

edited Feb 14 '22 at 17:48

answered Feb 10 '22 at 18:24

Robert Hijmans

40,301
4
55
63

Thank you, that's amazing. Annoyingly, I need an average (not the sum) and ideally do not want to loose the cells with some NAs present which is what happens if I run the st_warp with rasters loaded as proxy (don't know why) or when I try the sum in terra::resample (above). I've noticed this doesn't happen when I try the min and the max methods in terra::resample. I understand average method is not operational yet, isn't it? Thank you again. – Tania Barychka Feb 11 '22 at 16:08
Thank you, the average was not working, and I have fixed that. You can also use the sum and divide that by the sum of all non NA values. Something like `resample(!is.na(x), r, "sum")`. With my example data, the average is always one. – Robert Hijmans Feb 11 '22 at 17:49
Thank you again. I presume that the average method only uses non-NA values, doesn't it? Curiously, there is a difference in the output of the resample with average in terra depending on whether values are in memory or not. I've encountered that with stars::st_warp too. Without all values in memory, any cells containing NA values get lost. While with all values in memory, they remain, with the averages calculated over the non-NA values (I hope/presume). Have you encountered this before? Thank you again. – Tania Barychka Feb 14 '22 at 13:35
I've added examples in the original question. Curious...no idea why this could be. – Tania Barychka Feb 14 '22 at 13:49
That is curious, as I do not see that (see my expanded answer). Perhaps there is something unexpected with your file. Can you send it to me, or create an example that reproduces this? You do not need to "presume"; the manual is pretty clear, and you can check what happens with some example data as I have shown and with your own data. If `NA` values were used, the result would have to be `NA`. – Robert Hijmans Feb 14 '22 at 17:52

Calculate number of NA and non-NA values in a raster but with new resolution and extent

1 Answers1