1

I want to control where the temporary files generated by raster functions are stored. The reason is that I want to be able to remove temporary files that are specific to a process, without removing those used within other processes running in parallel. This was suggested by Luke Macaulay here.

The idea is that each process features commands that: set a process-specific tmpdir, then run some raster function, store its process-specific output somewhere else than tmpDir(), and finally remove the process-specific tmpDir().

But some raster operations keep storing their temp files in the "default" temp directory (looking like C:\Users\...\Temp\RtmpaevgEe). Hence these temp files cannot get deleted at the end of each process and ultimately risk to fill up the hard drive. This happens both regardless of being executed for a single process, in an iterative loop of processes or in a parallel setting.

My code to set the process-specific temporary directory is:

# Define which process we are in:
  processname <- file.path(<raster_to_input_to_this_process.tif>)
# Create path to process-specific temp directory
  process_tmp_dir <- file.path(paste0(processname,"_Tmp"))
# Create process-specific temp directory
  dir.create(process_tmp_dir, showWarnings = FALSE)
# set temp directory
  rasterOptions(tmpdir=process_tmp_dir)

rasterOptions() or tmpDir() indeed return process_tmp_dir and not the default temp directory it was returning prior to the rasterOptions(tmpdir=process_tmp_dir) command.

Then, if I run a mask operation (Raster,Spatial-method), temp files are generated in process_tmp_dir, as expected.

But if I run calc, overlay, or aggregate my process_tmp_dir remains empty while temp files appear in the default temp directory. After that though, rasterOptions() or tmpDir() return process_tmp_dir.

In every case I specified a filename argument and

canProcessInMemory(processname, verbose = T)
memory stats in GB
mem available: 9.66
        60%  : 5.8
mem needed   : 28.07
max allowed  : 4.66  (if available)
[1] FALSE

I wonder why these functions do not "comply" to the new tmpdir setting while mask does. (Also, note that mask generates .grd, much heavier, temp files than those, .tif, generated by calc etc..) I would highly appreciate any suggestions why this is so, and what could be done to make sure the temp files of any raster function are generated in the specified tmpDir().

I can make the data available and explicit the exact raster operations if you believe this is necessary to better understand what is happening.

1 Answers1

0

I cannot immediately think of what might cause this. So this does not answer your question directly, but it does suggest an alternative path.

I would either use the filename argument in each step and assigning unique filenames using the process id; or I would store the temp file names (which might be "" if the file is small), and then delete these files when you are done with them. Storing the filename would be required if you are using functions such as + that do not have a filename argument.

You can also run removeTmpFiles with an appropriate time lag, e.g. removeTmpFiles(h=0.5) at the end of the script.

Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Thank you for the suggestions. My work around was to simply mosaic my raster layers so that temp files were not filling up hard drive before the computation ended. Besides, I realised that setting a specific temp dir for each process is not necessary because in parallel setting (with foreach at least) each process gets a specific default temp dir that can be called with tmpDir() from within each process and hence removed without affecting temp files of other processes. But the question of why some functions do not generate their temp files in a specified tmpdir remains unanswered. – Valentin Guye Apr 04 '20 at 09:52
  • May I ask some clarification regarding the filename argument? I always pass an output path to the filename argument when I work with large rasters that cannot be read in memory. I was surprised when I first saw that sometimes (typically with the functions and raster layers described in my question) temp files were still generated although I passed a filename. Is there something I am missing regarding raster temp file generation? – Valentin Guye Apr 04 '20 at 10:08
  • There should not be; but it is difficult to comment without seeing the code. – Robert Hijmans Apr 04 '20 at 22:53