13

I'm working with large datasets and quite often R produces an error telling it can't allocate a vector of that size or it doesn't have enough memory.

My computer has 16GB RAM (Windows 10) and I'm working with datasets of around 4GB but some operations need a lot of memory, for example converting the dataset from wide format to long. In some situations I can use gc() to realease some memory but many times it's not enough.

Sometimes I can break the dataset on smaller chunks but sometimes I need to work with all the table at once.

I've read that Linux users don't have this problem, but what about Windows?

I've tried setting a large pagefile on a SSD (200GB) but I've found that R doesn't use it at all.

I can see the task manager and when the memory consumption reaches 16GB R stops working. The size of the pagefile doesn't seem to make any difference.

How can I force R to use the pagefile? Do I need to compile it myself with some special flags?

PD: My experience is that deleting an object rm() and later using gc() doesn't recover all the memory. As I perform operations with large datasets my computer has less and less free memory at every step, no matter if I use gc().

PD2: I expect not to hear trivial solutions like "you need more RAM memory"

PD3: I've been testing and the problem only happens in Rstudio. If I use directly R it works well. Does anybody know how to do it in RStudio.

skan
  • 7,423
  • 14
  • 59
  • 96
  • Most likely you don't have enough physical RAM on your computer. The solution is generally to buy more RAM. If you aren't using data.table you should try that; if you still run out of memory, the problem is definitely a hardware problem that can't be fixed through coding. – Hugh Oct 05 '16 at 14:19
  • I know I need more RAM memory, but it's quite expensive, and it's limited to the maximum amount your motherboard can support. That's why I suggest an option: using the pagefile (virtual memory), but it doesn't seem to work with R. Other programs can do it or even can work transparently paginating to disk. – skan Oct 05 '16 at 14:22
  • R does the memory management for you. It doesn't release memory to the OS if it doesn't have to; instead it reuses it as required. That using `gc` helps is a myth (as long as you only use one R instance). – Roland Oct 05 '16 at 14:23
  • I explained that many times it's not enough. That's why I need to force R to use the pagefile. – skan Oct 05 '16 at 14:23
  • 1
    RAM is not expensive. If you need unusual large amounts (more than 32 GB), you can always rent it online. Even if you manage to use SSD memory, that will slow down your analyses. – Roland Oct 05 '16 at 14:24
  • 3
    @Roland, that is not a solution for the problem, it's only a patch. – skan Oct 05 '16 at 14:24
  • It's possible there may be no solution. – Hugh Oct 05 '16 at 14:26
  • But other applications can do it, and R can do it on Linux and OSX. Maybe there is a flag that enables it. – skan Oct 05 '16 at 14:28
  • Are you able to post an example of some code that is causing the memory error? – Hugh Oct 05 '16 at 14:28
  • 2
    You could always install linux ... – Roland Oct 05 '16 at 14:28
  • @Hugh any code working with an object bigger than memory. Or many working with a smaller object but more complex operations. For example if I want to convert a large data.table from wide to long: idvars = grep("_", names(DT), invert = TRUE) dcast(melt(DT, id.vars = idvars) [, `:=`(var = sub('_.*', '', variable), year = sub('.*_', '', variable), variable = NULL)], ... ~ var, value.var='value') – skan Oct 05 '16 at 14:31
  • @ Hugh or if I want to detect what column contain a certain pattern (such as dates) mydata[, lapply(.SD,function(xx) length(grep("^\\d\\d?/\\d\\d?/\\d{4}$",xx))>0 ) ] – skan Oct 05 '16 at 14:33
  • @Roland, I also know it, that's why I expect it on the question. But this is for my job's computer and I'm not allowed to modify the operating system, officially Windows. – skan Oct 05 '16 at 14:34
  • You could write to disk in intermediate steps. – Hugh Oct 05 '16 at 14:39
  • 3
    I have never tried this, but you could try increasing R_MAX_MEM_SIZE. It is possible (I have never tried) that your pagefile is used if you have setup windows correctly. https://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021 – Roland Oct 05 '16 at 14:42
  • @Roland I've set R_MAX_MEM_SIZE to 128GB but R only seem to use up to 16GB. I've being reading and maybe is a problem with RStudio. – skan Oct 05 '16 at 19:20
  • That's easy to check. Try with Rgui. – Roland Oct 05 '16 at 19:57
  • Have you tried setting say `memory.limit(100e3)`? – Hugh Oct 06 '16 at 00:01
  • 1
    OK, I've been testing and the problem only happens in Rstudio. If I use directly R it works well. – skan Oct 06 '16 at 09:52

1 Answers1

17

In order to get it working automatically every time you start RStudio the solution with R_MAX_MEM_SIZE is ignored, both if created as an environment variable or if created inside the .Rprofile.

Writing memory.limit(64000) is ignored too.

The proper way is adding the following line in the file .Rprofile

invisible(utils::memory.limit(64000))

or whatever number you want.

Of course you need to have a pagefile big enough. That number includes free RAM and free pagefile space.

Using the pagefile is slower but it's going to be used only when needed.

Something strange I've found is that it only let's you increase the maximum memory to use but it doesn't allow you to decrease it.

skan
  • 7,423
  • 14
  • 59
  • 96