12

I am processing a time series of rasters (modis ndvi imagery) to calculate average and st.deviation of the series. Each yearly series is composed of 23 ndvi.tif images, each of 508Mb, so total is a big 11Gb to process. Below is the script for one year. I have to repeat this for a number of years.

library(raster)
library("rgeos")
filesndvi <- list.files(, pattern="NDVI.tif",full.names=TRUE) 
filesetndvi10 <- stack(filesndvi)
names(filesetndvi10)
avgndvi10<-mean(filesetndvi10)
desviondvi10 <- filesetndvi10 - avgndvi10
sumdesvioc <-sum(desviondvi10^2)
varndvi10  <- sumdesvioc/nlayers(filesetndvi10)
sdndvi10  <- sqrt(varndvi10)
cvndvi10  <- sdndvi10/avgndvi10

The problem: the process writes accumulatively in the hard drive until it's full. Don't know where in the HD the process writes. Only way to clean the HD I've found is reboot. Tried rm, didn't work. Tried closing RStudio, didn't work. I'm using R 3.0.2 with RStudio 0.98.994 with Ubuntu 14.04 on a 4Gb RAM Asus UX31 with a 256Gb HD. Any thoughts to clean the HD after the calculation for each year without rebooting will be much welcome. Thanks

user2942623
  • 559
  • 5
  • 8

4 Answers4

14

There are two other things to consider. First, make fewer intermediate files by combining steps in calc or overlay functions (not too much scope for that here, but there is some), This can also speed up computations as there will be less reading from and writing to disk. Second, take control of deleting specific files. In the calc and overlay functions you can provide filenames such that you can remove the files you no longer need. But you can also delete the temp files explicitly. It is of course good practice to first remove the objects that point to these files. Here is an example based on yours.

library(raster)
# example data
set.seed(0)
ndvi <- raster(nc=10, nr=10)
n1 <- setValues(ndvi, runif(100) * 2 - 1)
n2 <- setValues(ndvi, runif(100) * 2 - 1)
n3 <- setValues(ndvi, runif(100) * 2 - 1)
n4 <- setValues(ndvi, runif(100) * 2 - 1)
filesetndvi10 <- stack(n1, n2, n3, n4)

nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)
desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , filename='over_tmp.grd')
sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)

f <- filename(avgndvi10)
rm(avgndvi10, desviondvi10_2, sdndvi10)
file.remove(c(f, extension(f, '.gri')))
file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri'))

To find out where temp files are written to look at

rasterOptions()

or to get the path as a variable do:

dirname(rasterTmpFile()) 

To set it the path, use

rasterOptions(tmpdir='a path')
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
6

I struggle with the same, but have a few tricks that help. First off is get more memory. Ram and HD space are cheap and will have dramatic effects when dealing with large R objects such as rasters. Secondly, use removeTmpFiles() in the raster package. You can set it ti remove tmp files older than a certain number of hours. e.g. removeTmpFiles(0.5) will remove tmp files older than 30 minutes. Make sure you only set this for a time when the files will longer be called on. Thirdly, use something like the below snip of rasterOptions(). Be careful with setting memory chunk sizes; those will NOT work for your system, but you might find something more optimized than the defaults. Finally, use rm() and gc() to clean as you cook. Hope this helps, but if you find a better solution please let me know.

tmpdir_name <- paste(c(drive, ":/RASTER_TEMP/"), collapse='')
if(file.exists(tmpdir_name) == FALSE){
    dir.create(tmpdir_name)
}

rasterOptions(datatype = "FLT4S", 
    progress = "text", 
    tmpdir = tmpdir_name, 
    tmptime = 4, 
    timer = TRUE,
    tolerance = 0.5,
    chunksize = 1e+08,
    maxmemory = 1e+09)
Mr.ecos
  • 447
  • 1
  • 6
  • 13
  • Thanks a lot Mr.ecos. I'll delete the temporary files. I discovered they sit in /tmp/R_raster_myuser. And yes, I'll definitively buy more RAM, or maybe a parallel computer. Cheers – user2942623 Aug 21 '14 at 13:54
  • To close the question. I've solved my problem using rasterOptions(tmpdir="path") to direct the temporary files to a 3Tb external drive. Not the most elegant solutions, but it works. – user2942623 Aug 21 '14 at 22:03
  • Not sure about your last comment: writing your temp files to an external disk will likely make your computations a lot slower. Just use the commands above to delete your temp files every so often and you should be good. – Lucas Fortini Aug 22 '14 at 01:16
  • Agreed with @lucas , you have to be careful with that. When I have few choices, I make sure I use USB 3.0 and SSD or a really fast drive. But if you have only one choice and it is not a drive hooked to the system bus, then expect slower analysis because of the cumulative read/write time. Albeit slower, it still may solve the space issue. – Mr.ecos Aug 22 '14 at 01:26
  • Thanks for the comments. Speed of calculation was pretty reasonable, although not of my huge concern. External HD was connected by a USB 3.0. Manually deleting the temp files would mean I'd have to be watching the process. Writing the temp files in the external HD frees me of that job. – user2942623 Sep 01 '14 at 13:52
  • Glad to hear it! Please click the check mark for this answer if you found it helpful. Thanks! – Mr.ecos Sep 01 '14 at 13:55
  • To get the current temp directory, use dirname(rasterTmpFile()) to set it, use rasterOptions(tempdir='a path') – Robert Hijmans Sep 03 '14 at 18:46
5

I found another way to manage this problem that was better for me, drawing on this answer. In my case, I am using parallel looping and don't want to remove all the files from the temporary directory because it could remove other processes' temp files.

@RobertH's answer which suggests naming each individual temporary file name is good, but I wasn't sure if that manually forces raster to write even small files to a hard drive instead of using RAM and slowing down the process (raster documentation says that it only writes to disk if the file won't fit into RAM).

So, what I did is create a temporary directory from within the loop or parallel process that is tied to a unique name from the data that is being processed in the loop, in my case, the value of single@data$OWNER:

#creates unique filepath for temp directory
dir.create (file.path("c:/",single@data$OWNER), showWarnings = FALSE)

#sets temp directory
rasterOptions(tmpdir=file.path("c:/",single@data$OWNER)) 

Insert your processing code here, then at the end of the loop delete the whole folder:

#removes entire temp directory without affecting other running processes
unlink(file.path("c:/",single@data$OWNER), recursive = TRUE)
Luke Macaulay
  • 393
  • 5
  • 14
0

Maybe it's obvious but another tips I found implementing the advice in this thread is to be carful about the order you process the instructions. Try to avoid to do the same instructions in bulk and "clean all afterwards". Atomize the code and clean the small pieces. For exemple instead of (from above):

[...]
nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)

desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , 
filename='over_tmp.grd')
sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)

f <- filename(avgndvi10)
rm(avgndvi10, desviondvi10_2, sdndvi10)
file.remove(c(f, extension(f, '.gri')))
file.remove(c('over_tmp.grd', 'over_tmp.gri', 'calc_tmp.grd', 'calc_tmp.gri'))

This would require much less space both in terms of RAM and drive:

[...]
nl <- nlayers(filesetndvi10)
avgndvi10 <- mean(filesetndvi10)

desviondvi10_2 <- overlay(filesetndvi10, avgndvi10, fun=function(x, y) (x - y)^2 , 
filename='over_tmp.grd')
rm(avgndvi10)
file.remove(c('over_tmp.grd', 'over_tmp.gri'))

sdndvi10 <- calc(desviondvi10_2, fun=function(x) sqrt(sum(x) / nl), filename='calc_tmp.grd')
rm(desviondvi10_2)
file.remove(c('calc_tmp.grd', 'calc_tmp.gri'))

cvndvi10  <- overlay(xsdndvi10, avgndvi10, fun=function(x,y) x / y, filename='cvndvi10.grd', overwrite=TRUE)
rm(sdndvi10)
file.remove(c('cvndvi10.grd', 'cvndvi10.gri'))
Marc
  • 651
  • 5
  • 16