4

I am working with fairly large S4 objects in R (~10GB). Often, I want to load them, do something to them, then save the results. Ideally, I would just overwrite the object using SaveRDS. For example:

a = readRDS(object_file)
a = do_something()
saveRDS(a, file=object_file)

However, this seems unsafe because if the file gets corrupted then I lose all of my work. For example, I work remotely and if my internet disconnects, I'm worried that it would only write part of the file. At the same time, I don't want to save multiple copies of the object, because then I need to manually organize them and delete old versions.

The approach I am considering is like this:

a = readRDS(object_file)
a = do_something()
saveRDS(a, file=temporary_file)
b = readRDS(temporary_file)
system(paste('mv', temporary_file, object_file))

Are there any problems with this? Is there a better way of validating a file and writing to it? I figure that the "mv" command is fast, so interrupting it is less of a worry. I've tried looking around, but couldn't find anything.

adn bps
  • 599
  • 4
  • 16
  • A better solution to your problem is to use tools that preserve your session even if your disconnect while working remotely. E.g. `screen` or `tmux` if you're working remotely on a linux-based server. – Scott Ritchie Mar 16 '17 at 03:23
  • Thank you, I will look into doing that. – adn bps Mar 21 '17 at 19:14

0 Answers0