5

I have a list in R, my_list2 in the example below.

I want to add items to the list in a way that minimises the peak RAM usage.

Is there a more memory efficient way to do this than using the append function?

I'm aware that it's best practice to create an 'empty' list then fill it as per my_list2 in the example below, but this isn't an option as the list already exists.

# If I could create the list from scratch I'd do it list this:
my_list <- vector('list', 10)
for (i in 1:10) {
  my_list[[i]] <- i
}

# Is there a better way than the 'append' function?
my_list2 <- list(1)
for (i in 2:10) {
  my_list2 <- append(my_list2, i)
}
jruf003
  • 980
  • 5
  • 19
  • Would you be able to extend `my_list` with more overhead than you need, and then chop it down afterwards if you didn't use the pre-allocated space? – thelatemail May 15 '23 at 03:06
  • Thanks for the suggestion, are you able to provide an example of how I could do this in a memory efficient way? – jruf003 May 15 '23 at 03:32
  • Something like: `length(my_list) <- 100` then loop and do the `my_list[[i]] <- i` loop, then `my_list[sapply(my_list, Negate(is.null))]` afterwards. I'm not sure about the memory implications, hence the comment, not an answer. – thelatemail May 15 '23 at 03:43
  • 1
    That makes sense, thanks for the elaboration/example. – jruf003 May 15 '23 at 03:48
  • @thelatemail You don't really need to pre-allocate lists. List elements need much more memory than the list itself. – Roland May 15 '23 at 09:34

2 Answers2

5

Rather than using append() in each iteration, you could create a temporary list and append it to my_list2 only once at the end. Would this do the job for you?

Here's an example with 5k iterations in the for loop:

my_list <- list(1)
my_list2 <- list(1)

bench::mark(
  orig = {
    for (i in 2:5000) {
      my_list <- append(my_list, i)
    }
    my_list
  },
  mine = {
    tmp <- vector("list", 4999)
    for (i in 1:4999) {
      tmp[[i]] <- i + 1
    }
    append(my_list2, tmp)
  },
  iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 orig       420.01ms    1.69s     0.567    95.7MB     13.6
#> 2 mine         1.52ms      2ms   406.       96.8KB      0

Note that bench::mark() automatically checks that both codes give the same output.

bretauv
  • 7,756
  • 2
  • 20
  • 57
  • Thanks for this. I'm not familiar with bench::mark but I'm assuming mem_alloc is tracking memory? And assuming so is it fair to say that while your code is much faster the two approaches have similar memory? I need to reduce memory usage rather than improve speed (although faster run time would also be helpful) – jruf003 May 15 '23 at 05:53
  • 1
    Yes `mem_alloc` tracks memory. Note that my way is 1000x more memory efficient (`MB` in your case, `KB` in mine) – bretauv May 15 '23 at 05:55
  • 1
    I missed the difference in units! That's great, accepting your answer now... – jruf003 May 15 '23 at 05:58
  • 1
    yeah, watch out for the different units in the columns - why I never use bench::mark – tospig May 15 '23 at 06:58
  • The column *mem_alloc* gives how much memory is allocated but at the same time also memory is set to not used and freed stochastic by `gc`. With this I think it is not possible to show which method is better in **peak RAM usage**. – GKi May 15 '23 at 08:34
1

A practical solution with low peak RAM usage can look like:

my_list <- list(1)
N <- length(my_list)
length(my_list) <- N + 9
for (i in 2:10) {
  my_list[[N + i -1]] <- i
  #gc() #Optional
}

You can use gc to get the peak RAM usage. But this is much influenced whether there was a garbage collection or not during execution. To see the minimum possible peak gctorture could be turned on, but then the execution time gets typical much slower. As the result could be influenced by the order how the methods are called I start each time a new vanilla session.

#Using append
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) L <- append(L, list(sample(n)))
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  344156 18.4     664228 35.5   345174 18.5
#Vcells 1215086  9.3    8388608 64.0  1265554  9.7
#Using [[<-
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) L[[length(L)+1]] <- sample(n)
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346937 18.6     664228 35.5   347919 18.6
#Vcells 1221639  9.4    8388608 64.0  1272088  9.8
#Using [[<- but resizing the list before
n <- 1e5
gctorture(on=TRUE)

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) L[[N - 1 + i]] <- sample(n)
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346564 18.6     664228 35.5   347498 18.6
#Vcells 1220761  9.4    8388608 64.0  1271479  9.8

Here append needs 8.0 Mb and [[<- 8.2 Mb independent if the list size is increased before or not.


Doing the same but without gctorture but manually using gc after each step gives:

#Using append
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) {L <- append(L, list(sample(n))); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  344145 18.4     664228 35.5   372952 20.0
#Vcells 1215054  9.3    8388608 64.0  1319826 10.1
#Using [[<-
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
for (i in 2:10) {L[[length(L)+1]] <- sample(n); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  346926 18.6     664228 35.5   377474 20.2
#Vcells 1221607  9.4    8388608 64.0  1352555 10.4
n <- 1e5

set.seed(0)
L <- list(sample(n))
gc(reset=TRUE)
#         used (Mb) gc trigger (Mb) max used (Mb)
#Ncells 285638 15.3     664228 35.5   285638 15.3
#Vcells 633121  4.9    8388608 64.0   633121  4.9
N <- length(L)
length(L) <- N + 9
for (i in 2:10) {L[[N - 1 + i]] <- sample(n); gc()}
gc()
#          used (Mb) gc trigger (Mb) max used (Mb)
#Ncells  347659 18.6     664771 35.6   374526 20.1
#Vcells 1223042  9.4    8388608 64.0  1273592  9.8

Here append needs 9.9 Mb, [[<- without resizing the list in advance 10.4 Mb and when the list size is increased before 9.7 Mb.


In case you want to know the total amount of allocated but maybe in the meantime also freed memory or other options have a look at Monitor memory usage in R.

GKi
  • 37,245
  • 2
  • 26
  • 48