2

I am producing docx files using the officer library. The database is not big (a few tens of lines and a few tens of columns). After a certain number of iterations, the memory usage blows up and finally R crashes.

I cannot share the whole code and database, so I'm adding a kind-of skeleton code of the loop here. I hope it helps understanding the issue:

datasource <- mydatasource
for(i in 1:nrow(datasource)){
  datasource_case <- as.data.frame(datasource[i,])
  datasource_case.melt <- melt(datasource_case[, id.vars=c(ID))

  assign(x="DOC_OUTPUT", value=read_docx(path=template_file), envir=.GlobalEnv)
  mygraph <- ggplot(datasource_case.ppa_melt.scale, ...)
  assign(x="DOC_OUTPUT", value=body_add_gg(DOC_OUTPUT, value=mygraph), envir=.GlobalEnv)

  rm(mygraph)
  rm(datasource_case.melt)
  rm(datasource_case)

  print(DOC_OUTPUT, target=outfile)
  rm(DOC_OUTPUT, envir=.GlobalEnv)

  gc(verbose=TRUE, full=TRUE)
  .jgc(R.gc = FALSE)
}

So, I'm trying to destroy all the objects after using them (even though they would be overwritten), and I also do gc().

Yet, gc() reports:

Garbage collection 48 = 40+4+4 (level 2) ... 
58.1 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 2
Garbage collection 58 = 48+5+5 (level 2) ... 
58.1 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 3
Garbage collection 67 = 56+5+6 (level 2) ... 
58.1 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 4
Garbage collection 76 = 63+5+8 (level 2) ... 
58.1 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 5
Garbage collection 85 = 71+5+9 (level 2) ... 
58.2 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 6
Garbage collection 94 = 79+5+10 (level 2) ... 
58.2 Mbytes of cons cells used (55%)
15.0 Mbytes of vectors used (23%)
   > 7
Garbage collection 103 = 86+6+11 (level 2) ... 
58.2 Mbytes of cons cells used (56%)
15.1 Mbytes of vectors used (23%)
   > 8
Garbage collection 112 = 94+6+12 (level 2) ... 
58.2 Mbytes of cons cells used (56%)
15.1 Mbytes of vectors used (24%)
   > 9
Garbage collection 121 = 101+7+13 (level 2) ... 
58.2 Mbytes of cons cells used (56%)
15.2 Mbytes of vectors used (24%)
   > 10
Garbage collection 130 = 109+7+14 (level 2) ... 
58.2 Mbytes of cons cells used (56%)
15.4 Mbytes of vectors used (24%)
   > 11
Garbage collection 139 = 117+7+15 (level 2) ... 
58.3 Mbytes of cons cells used (56%)
15.8 Mbytes of vectors used (25%)
   > 12
Garbage collection 148 = 124+8+16 (level 2) ... 
58.4 Mbytes of cons cells used (56%)
16.6 Mbytes of vectors used (26%)
   > 13
Garbage collection 157 = 132+8+17 (level 2) ... 
58.5 Mbytes of cons cells used (56%)
18.4 Mbytes of vectors used (29%)
   > 14
Garbage collection 166 = 140+8+18 (level 2) ... 
58.7 Mbytes of cons cells used (56%)
22.2 Mbytes of vectors used (35%)
   > 15
Garbage collection 175 = 147+9+19 (level 2) ... 
59.2 Mbytes of cons cells used (56%)
30.4 Mbytes of vectors used (47%)
   > 16
Garbage collection 184 = 155+9+20 (level 2) ... 
60.0 Mbytes of cons cells used (57%)
47.6 Mbytes of vectors used (61%)
   > 17
Garbage collection 195 = 163+10+22 (level 2) ... 
61.8 Mbytes of cons cells used (59%)
83.7 Mbytes of vectors used (64%)
   > 18
Garbage collection 208 = 173+11+24 (level 2) ... 
65.3 Mbytes of cons cells used (62%)
160.5 Mbytes of vectors used (77%)
   > 19
Garbage collection 223 = 183+13+27 (level 2) ... 
72.3 Mbytes of cons cells used (56%)
321.7 Mbytes of vectors used (80%)
   > 20
Garbage collection 235 = 189+15+31 (level 2) ... 
86.3 Mbytes of cons cells used (55%)
659.3 Mbytes of vectors used (75%)
   > 21
Garbage collection 245 = 195+15+35 (level 2) ... 
114.4 Mbytes of cons cells used (58%)
1365.0 Mbytes of vectors used (73%)
   > 22
Garbage collection 255 = 199+17+39 (level 2) ... 
170.4 Mbytes of cons cells used (59%)
2837.6 Mbytes of vectors used (72%)
   > 23
Garbage collection 264 = 203+18+43 (level 2) ... 
282.4 Mbytes of cons cells used (58%)
5904.6 Mbytes of vectors used (72%)
   > 24
Killed

How can effectively prevent memory from blowing up?

My last resort would be to move the code from the loop into a separate script and call a new R session for every iteration, but that wouldn't be very elegant.

Alpi Murányi
  • 1,117
  • 10
  • 17

1 Answers1

0

(So... the first time that I answer my own question.)

Earlier, I had wrapped the officer calls in custom functions which were all reading and writing the global (.GlobalEnv) variables, including the one holding the large officer document.

Turns out that one of the functions made a local copy of the global variable, which was then:

  • Not destroyed automatically by R when the function returned
  • Not overwritten in memory when the function was rerun
  • Not cleaned up by gc() (gc() was ran but did not help)
  • Not removed by me until rm() (I did not include rm())

...and it kept littering memory in a sort-of exponential fashion.

The solution was, in my case, to avoid making a copy that is local to the function. Alternatively, I think, explicitly removing the object using rm() would have worked too.

Now memory usage is flat and any number of iterations can be run safely.

Lesson learned. Don't count on others cleaning up your stuff.

Alpi Murányi
  • 1,117
  • 10
  • 17