12

I want to measure my peak memory usage in R so that I can allocate resources appropriately. The method must include intermediate objects created during the analysis. For example, mx is a 80-Mb object created on every loop of lapply but never saved as a global variable. The peak memory usage should be at least 80Mb above baseline.

gc(reset = TRUE)
sum(gc()[, "(Mb)"]) # 172Mb

lapply(1:3, function(x) {
  mx <- rnorm(1e7) # 80Mb object
  mean(mx)
})

sum(gc()[, "(Mb)"]) # still 172Mb!
Jeff Bezos
  • 1,929
  • 13
  • 23

3 Answers3

8

I found what I was looking for in the peakRAM package. From the documentation:

This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.

mem <- peakRAM({
  for(i in 1:5) {
    mean(rnorm(1e7))
  }
})
mem$Peak_RAM_Used_MiB # 10000486MiB
Jeff Bezos
  • 1,929
  • 13
  • 23
5

You can use the gc function for that.

Indeed, the gc function provides the current and maximum memory used within the fields 11 and 12 (in Mb regarding the documentation, but obviously in Mio in practice on my machine). You can reset the maximum value with the parameter reset=TRUE. Here is an example:

> gc(reset=TRUE)
         used (Mb) gc trigger   (Mb) max used (Mb)
Ncells 318687 17.1     654385   35.0   318687 17.1
Vcells 629952  4.9  397615688 3033.6   629952  4.9
> a = runif(1024*1024*64)  # Should request 512 Mio to the GC (on my machine)
> gc()
           used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells   318677  17.1     654385   35.0   318834  17.1
Vcells 67738785 516.9  318092551 2426.9 67739236 516.9
> memInfo <- gc()
> memInfo[11]              # Maximum Ncells
[1] 17.1
> memInfo[12]              # Maximum Vcells
[1] 516.9
> rm(a)                    # `a` can be removed by the GC from this point
> gc(reset=TRUE)           # Order to reset the GC infos including the maximum
         used (Mb) gc trigger   (Mb) max used (Mb)
Ncells 318858 17.1     654385   35.0   318858 17.1
Vcells 630322  4.9  162863387 1242.6   630322  4.9
> memInfo <- gc()
> memInfo[11]
[1] 17.1
> memInfo[12]              # The maximum has been correctly reset
[1] 4.9

In this example we can see that up to 516.9 - 4.9 = 512 Mb has been allocated by the GC between the two gc calls surrounding the runif call (which is consistent with the expected result).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • Actually I don't think this method accounts for intermediate values in functions and apply loops, see my new test above – Jeff Bezos Aug 19 '20 at 19:18
  • Does it take into account memory that is not managed by gc? Like malloc rather than R_alloc? – jangorecki Aug 26 '20 at 05:50
  • 1
    @jangorecki No, memory allocated with `malloc` should not be seen by the GC as it is not explicitly requested to do so. However, AFAIK most R library persistent objects are not allocated using a plain `malloc` because object are not collected (unless at exit time or manually by the user). Note that it may change in future release of R as the version 4.0 seems to experiments with reference counting. – Jérôme Richard Aug 26 '20 at 18:44
5

The object returned by lapply weights only 488 bytes because it's summarized : garbage collection has deleted the intermediate objects after mean calculation.
help('Memory') gives useful information on how R manages memory.
In particular, you can use object.size() to follow-up size of individual objects, and memory.size() to know how much total memory is used at each step :

# With mean calculation
gc(reset = T)
#>          used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 405777 21.7     831300 44.4   405777 21.7
#> Vcells 730597  5.6    8388608 64.0   730597  5.6
sum(gc()[, "(Mb)"]) 
#> [1] 27.3

l<-lapply(1:3, function(x) {
  mx <- replicate(10, rnorm(1e6)) # 80Mb object
  mean(mx)
  print(paste('Memory used:',memory.size()))
})
#> [1] "Memory used: 271.04"
#> [1] "Memory used: 272.26"
#> [1] "Memory used: 272.26"

object.size(l)
#> 488 bytes


## Without mean calculation :
gc(reset = T)
#>          used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells 464759 24.9     831300  44.4   464759 24.9
#> Vcells 864034  6.6   29994700 228.9   864034  6.6
gcinfo(T)
#> [1] FALSE
sum(gc()[, "(Mb)"]) 
#> [1] 31.5
l<-lapply(1:4, function(x) {
  mx <- replicate(10, rnorm(1e6))
  print(paste('New object size:',object.size(mx)))
  print(paste('Memory used:',memory.size()))
  mx
})
#> [1] "New object size: 80000216"
#> [1] "Memory used: 272.27"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 348.58"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 424.89"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 501.21"

object.size(l)
#> 320000944 bytes
sum(gc()[, "(Mb)"]) 
#> [1] 336.7

Created on 2020-08-20 by the reprex package (v0.3.0)

If instead of returning mean you return the whole object, the increase in memory use is significant.

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Is there a method for monitoring memory usage that accounts for garbage collection? Maybe something external to RStudio? – Jeff Bezos Aug 20 '20 at 02:19
  • Perhaps have a look at `help('Memory')`.To supervise current memory use you can print `memory.size`, see my edit – Waldi Aug 20 '20 at 05:19