0

This is basically the same issue found in this question https://gis.stackexchange.com/questions/95481/in-r-set-na-cells-in-one-raster-where-another-raster-has-values and here R - Chaging specific cell values in a large raster layer. The answers in the first question did not solve my problem because I do not have another object that I can use with overlay or calc. Reclassify is the only thing that works.

in my case, I have a a raster file which is 300 MB. I am applying a simple operation, just trying to replace all values in the raster that are equal to a certain number with NAs. I have about 4 GB of RAM available, but it seems I cannot complete this operation because I get the error "cannot allocate vector of size 4.6 GB". I have tried even setting my memory size to 16 GB but then I get the same error just saying that cannot allocate vector of size 9.2 GB. I have tried the following two options:

r[r==5]=NA 
values(r)[values(r)==5]

strangely, even a simple operation such as table(values(r)) gives the same error, for example ArcMap can create this table in a few seconds. I have already solved my problem but I am wondering why this huge inefficiency of memory use and how can it be prevented or avoided? Why is raster requiring up to 9 GB to process a file that is 300 MB? Is this a limitation of this package or with R?

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
Herman Toothrot
  • 1,463
  • 3
  • 23
  • 53

2 Answers2

6

You can avoid loading all values into memory by using raster package functions instead. For example:

library(raster)
r <- raster(ncol=10, nrow=10)
r[] <- sample(5, ncell(r), replace=TRUE)

Instead of table(values(r)) use freq:

freq(r)

Change value 5 to NA. Either with reclassify of with subs

x <- reclassify(r, data.frame(from=5, to=NA))
y <- subs(r, data.frame(from=5, to=NA), subsWithNA=FALSE)

freq(x)

lbusset suggests calc and that should also work fine.

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
3

I believe that the problem here is that with your approaches you are loading all the raster data in memory, "duplicating" it a couple of times because r[r==5]=... needs to create a copy of r, and also creating large accessory variables (e.g., r == 5). Add to this that while importing in R you are automatically "casting" to double and removing any (eventual) compression, and the memory requirements will scale up rapidly.

The problem with table(values(r)) is the same, because values(r) will load all pixels' values in memory.

With moderately large rasters, it is always best to use calc or overlay, which work by chunks of lines and "swap" to disk if needed. In your case, this (taken directly from calc help) should work:

fun <- function(x) {x[x==5] <- NA; return(x) }
r2 <- raster::calc(r, fun)
lbusett
  • 5,801
  • 2
  • 24
  • 47