Growing a list with variable names in R

Question

I am trying to grow a list in R, where both the value and name of each entry is held in a variable, but it doesn't seem to work.

my_models_names <- names(my_models)
my_rocs=list() 
for (modl in my_models_names) {

    my_probs <- testPred[[modl]]$Y1
    my_roc <- roc(Ytst, my_probs)
    c(my_rocs, modl=my_roc) # <-- modl and my_roc are both variables
    }

My list my_rocs is empty at the end, even though I know that the loop iterates (my_roc is filled in) Why?

On a related note, is there a way to do this without looping?

reproducible example please ... ?? http://tinyurl.com/reproducible-000 ... `lapply` is the way to do the problem without (explicit) looping — Ben Bolker, Feb 10 '13 at 18:29
Thanks @BenBolker. You are right, Sorry for not having provided it, I will put one together, but in the mean time I think I found an answer on another thread. — Amelio Vazquez-Reina, Feb 10 '13 at 18:40

score 25 · Answer 1 · answered Feb 11 '13 at 07:24

25

Generally in R, growing objects is bad. It increases the amount of memory used over starting with the full object and filling it in. It seems you know what the size of the list should be in advance.

For example:

my_keys <- letters[1:3]
mylist <- vector(mode="list", length=length(my_keys))
names(mylist) <- my_keys

mylist
## $a
## NULL

## $b
## NULL

## $c
## NULL

You can do assignment this way:

key <- "a"
mylist[[key]] <- 5
mylist
## $a
## [1] 5
##
## $b
## NULL
##
## $c
## NULL

answered Feb 11 '13 at 07:24

sebastian-c

15,057
3
47
93

+1. And more important that the memory usage, the constant reallocation of memory makes growing an object very slow for large datasets, up-to several orders of magnitude. – Paul Hiemstra Feb 11 '13 at 07:40
5

Growing lists is fine, in fact: https://www.r-bloggers.com/growing-list-vs-growing-queue/ – Dmitry Zotikov Mar 29 '19 at 08:38
@PaulHiemstra , it seem not to be an issue for lists for current R implementation. system.time({ l = list(); for(i in 1:5000) l[[i]] = rnorm(1E4)});system.time({l = lapply(1:5000, function(i) rnorm(1E4))}) – Soren Havelund Welling Apr 04 '20 at 10:51
I've generally followed this "growing objects is bad" advice; but I must say I don't really understand how it makes any difference for lists. I can have a list of length 4 that takes up my entire memory and a list of length 100 that takes up a tiny fraction. The length of the list doesn't say much about its size. – Fons MA Aug 02 '21 at 05:52

score 9 · Accepted Answer · edited May 23 '17 at 12:02

9

I found the answer on this thread.

I can grow a list using the following generic formula:

mylist <- list()

for (key in my_keys){ 
mylist[[ key ]] <- value # value is computed dynamically
}

In my OP:

mylist is my_rocs
key is modl
value is my_roc

edited May 23 '17 at 12:02

Community

1
1

answered Feb 10 '13 at 18:39

Amelio Vazquez-Reina

91,494
132
359
564

2

I would really consider not growing the object, this gets really slow when `mylist` becomes big. See my answer for an example using `lapply` (which is the R-way to do it), or preallocate your `mylist` to the correct size. It might not make a difference when `mylist` is short, but in general this style is slow. – Paul Hiemstra Feb 11 '13 at 07:46
Can this be done if you need to crwate multiple lists per key with lapply() ? @PaulHiemstra – Union find Mar 03 '18 at 19:32
growing lists seems just as fast as lapply system.time({l = lapply(letters, function(i) rnorm(1E6));names(l)=letters;force(l)}) system.time({ l = list(); for(i in letters) l[[i]] = rnorm(1E6);force(l)}) – Soren Havelund Welling Apr 04 '20 at 10:36

score 3 · Answer 3 · answered Feb 11 '13 at 07:44

3

You can also use a more R-like soltution, and use lapply:

get_model = function(model_name) {
    my_probs <- testPred[[model_name]]$Y1
    return(roc(Ytst, my_probs))
  }
model_list = lapply(names(my_models), get_model)

Note that this solution saves you a lot of boilerplate code, it also does not suffer from the reallocation problem of your solution by growing the object. For large datasets, this can mean that the lapply solution is thousands of times faster.

answered Feb 11 '13 at 07:44

Paul Hiemstra

59,984
12
142
149

how does `lapply` achieve this thousand-fold increase in speed? It cannot possibly know in advance the size of the output of each iteration of the function it is applying? – Fons MA Aug 02 '21 at 05:56

Growing a list with variable names in R

3 Answers3

Linked