0

I'm working on a project with a rather large workspace. Unfortunately I can't save the workspace and it freezes. If I have a small workspace I can do save.image() with just a dataframe

>library(dplyr); library(tidyr);library(tidyverse);library(tidytext);library(pryr)
>master = readRDS("data")
> pryr::object_size(master)
527 MB
>save.image(safe=F)
> pryr::mem_used()
682 MB
> memory.limit()
[1] 8142

it takes like 10 seconds but it saves the 116 MB compressed .Rdata file just fine. Also if I try save.image(compress=F) it takes less than a second.

> master_tidy = master %>% unnest_tokens(word, text)
> pryr::object_size(master_tidy)
565 MB
> pryr::mem_used()
758 MB

And now if I try to run save.image() or save.image(compress=F) it will get stuck and I have to terminate R as the stop request doesn't work either. If I run task manager I do see that while R is stuck it uses 100+ MB/s Disk and 2% (depends on type of compression) CPU but even after 15 minutes it's still running save.image(). Also I see the .RdataTmp files in the directory and have tried save.image(safe=F) to no avail. I find it strange that after I unnest_tokens() I can no longer use save.image(), however I can't recreate this example using the shakespear tidytext example so I'm not sure what the problem is.

  • How about saving the file with `.rda` extension? Or using `saveRDS()`? – Teun Sep 16 '18 at 10:05
  • I want to save the whole workspace so I can just load the workspace whenever I open the project. I don't think changing the extension name would make much of a difference and I don't know how to save the workspace using `saveRDS()` – user6500630 Sep 16 '18 at 12:18
  • Also I can save `master` using `saveRDS` but I cannot save `master_tidy` using `saveRDS`. Something about "tidying" the object has made it impossible to save to disk. – user6500630 Sep 16 '18 at 12:29
  • After 20 minutes `master_tidy` was saved into a 119 GB file. – user6500630 Sep 16 '18 at 13:03

1 Answers1

3

I suspect you might not love my answer here, but maybe the ideas in it can help you in the way they have helped me! The problem you are running into is demonstrating, in a concrete way, that saving an R workspace (as a way to keep track of your work or save time) isn't well-suited for a data analysis workflow.

Instead of working with an analysis as if you can open up an R workspace (a "real" thing?) in some not-quite-known state at any time, you can adopt a workflow where your R script is the "real" thing you are working with, track, save, etc.

To quote from the ESS manual:

The source code is real. The objects are realizations of the source code. Source for EVERY user modified object is placed in a particular directory or directories, for later editing and retrieval.

Some alternative habits, borrowed from Jenny Bryan's excellent blog post

  • Do not save .RData when you quit R and don't load .RData when you start R
  • Restart R very often and run your script from the top
  • Have an object that takes a long time to create? Write a separate script that creates it and save the object to a file using saveRDS()
Julia Silge
  • 10,848
  • 2
  • 40
  • 48