12

I am trying to grow a list in R, where both the value and name of each entry is held in a variable, but it doesn't seem to work.

my_models_names <- names(my_models)
my_rocs=list() 
for (modl in my_models_names) {

    my_probs <- testPred[[modl]]$Y1
    my_roc <- roc(Ytst, my_probs)
    c(my_rocs, modl=my_roc) # <-- modl and my_roc are both variables
    }

My list my_rocs is empty at the end, even though I know that the loop iterates (my_roc is filled in) Why?

On a related note, is there a way to do this without looping?

Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • 2
    reproducible example please ... ?? http://tinyurl.com/reproducible-000 ... `lapply` is the way to do the problem without (explicit) looping – Ben Bolker Feb 10 '13 at 18:29
  • Thanks @BenBolker. You are right, Sorry for not having provided it, I will put one together, but in the mean time I think I found an answer on another thread. – Amelio Vazquez-Reina Feb 10 '13 at 18:40

3 Answers3

25

Generally in R, growing objects is bad. It increases the amount of memory used over starting with the full object and filling it in. It seems you know what the size of the list should be in advance.

For example:

my_keys <- letters[1:3]
mylist <- vector(mode="list", length=length(my_keys))
names(mylist) <- my_keys

mylist
## $a
## NULL

## $b
## NULL

## $c
## NULL

You can do assignment this way:

key <- "a"
mylist[[key]] <- 5
mylist
## $a
## [1] 5
##
## $b
## NULL
##
## $c
## NULL
sebastian-c
  • 15,057
  • 3
  • 47
  • 93
  • +1. And more important that the memory usage, the constant reallocation of memory makes growing an object very slow for large datasets, up-to several orders of magnitude. – Paul Hiemstra Feb 11 '13 at 07:40
  • 5
    Growing lists is fine, in fact: https://www.r-bloggers.com/growing-list-vs-growing-queue/ – Dmitry Zotikov Mar 29 '19 at 08:38
  • @PaulHiemstra , it seem not to be an issue for lists for current R implementation. system.time({ l = list(); for(i in 1:5000) l[[i]] = rnorm(1E4)});system.time({l = lapply(1:5000, function(i) rnorm(1E4))}) – Soren Havelund Welling Apr 04 '20 at 10:51
  • I've generally followed this "growing objects is bad" advice; but I must say I don't really understand how it makes any difference for lists. I can have a list of length 4 that takes up my entire memory and a list of length 100 that takes up a tiny fraction. The length of the list doesn't say much about its size. – Fons MA Aug 02 '21 at 05:52
9

I found the answer on this thread.

I can grow a list using the following generic formula:

mylist <- list()

for (key in my_keys){ 
mylist[[ key ]] <- value # value is computed dynamically
}

In my OP:

  • mylist is my_rocs
  • key is modl
  • value is my_roc
Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • 2
    I would really consider not growing the object, this gets really slow when `mylist` becomes big. See my answer for an example using `lapply` (which is the R-way to do it), or preallocate your `mylist` to the correct size. It might not make a difference when `mylist` is short, but in general this style is slow. – Paul Hiemstra Feb 11 '13 at 07:46
  • Can this be done if you need to crwate multiple lists per key with lapply() ? @PaulHiemstra – Union find Mar 03 '18 at 19:32
  • growing lists seems just as fast as lapply system.time({l = lapply(letters, function(i) rnorm(1E6));names(l)=letters;force(l)}) system.time({ l = list(); for(i in letters) l[[i]] = rnorm(1E6);force(l)}) – Soren Havelund Welling Apr 04 '20 at 10:36
3

You can also use a more R-like soltution, and use lapply:

get_model = function(model_name) {
    my_probs <- testPred[[model_name]]$Y1
    return(roc(Ytst, my_probs))
  }
model_list = lapply(names(my_models), get_model)

Note that this solution saves you a lot of boilerplate code, it also does not suffer from the reallocation problem of your solution by growing the object. For large datasets, this can mean that the lapply solution is thousands of times faster.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • how does `lapply` achieve this thousand-fold increase in speed? It cannot possibly know in advance the size of the output of each iteration of the function it is applying? – Fons MA Aug 02 '21 at 05:56