R: Why is the [[ ]] approach for subsetting a list faster than using $?

Question

I've been working on a few projects that have required me to do a lot of list subsetting and while profiling code I realised that the object[["nameHere"]] approach to subsetting lists was usually faster than the object$nameHere approach.

As an example if we create a list with named components:

a.long.list <- as.list(rep(1:1000))
names(a.long.list) <- paste0("something",1:1000)

Why is this:

system.time (
for (i in 1:10000) {
    a.long.list[["something997"]]
}
)


user  system elapsed 
0.15    0.00    0.16

faster than this:

system.time (
    for (i in 1:10000) {
        a.long.list$something997
    }
)

user  system elapsed 
0.23    0.00    0.23

My question is simply whether this behaviour is true universally and I should avoid the $ subset wherever possible or does the most efficient choice depend on some other factors?

+1. I suspect it's related to partial matching with the `$` sign. Suppose you have `my_list <- list("a" = 1, "ace" = 2)`. If you try `my_list$ac` it gets `ace`, but if you try `my_list[["ac"]]`, it finds nothing. — Frank, May 18 '13 at 23:48
Not answering your question, but if performance were an issue, then you'd rather write a vectorized look-up `query <- sample(names(a.long.list), 1000); a.long.list[query]` to play well with your other vectorized code. — Martin Morgan, May 19 '13 at 00:23
not ruling out the partial matching theory, but what I hope a complete answer will include is why adding `exact = FALSE` to `[[` in the OP's example does not degrade the performance. — flodel, May 19 '13 at 11:48
If we change the number of list items to 6 and search for the last one then $ seems faster: `n <- 6; short <- as.list(rep(1:n)); names(short) <- paste0("something",1:n); system.time ( for (i in 1:10000) short[["something6"]] ); system.time ( for (i in 1:10000) short$something6 ) ` — G. Grothendieck, May 19 '13 at 12:34
@G.Grothendieck At least on my system, the [[]] approach is still faster than $ for that list. I had to bump the reps up to 1000000 to get a difference between the two: elapsed: 0.46 versus elapsed: 0.56. — Jon M, May 19 '13 at 12:47
Seems worth mentioning that `$` and `[[` are implemented by two entirely different C functions (both in `src/main/subset.c`). For `$`, the relevant function is [`do_subset3`](https://github.com/wch/r-source/blob/trunk/src/main/subset.c#L1057) which in turn calls [`R_subset3_dflt`](https://github.com/wch/r-source/blob/trunk/src/main/subset.c#L1106). `[[` uses another function, [`do_subset2`](https://github.com/wch/r-source/blob/trunk/src/main/subset.c#L840), which in turn calls [`do_subset2_dflt`](https://github.com/wch/r-source/blob/trunk/src/main/subset.c#L863). — Josh O'Brien, May 19 '13 at 17:23
The comment preceding `do_subset2` notes simply: "The [[ subset operator. It needs to be fast." — Josh O'Brien, May 19 '13 at 17:25
Also probably worth mentioning one of the newest changes in R 3.0.0: "Partial matching when using the $ operator on data frames now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]." — zap2008, May 21 '13 at 02:31

score 11 · Accepted Answer · answered May 24 '13 at 11:17

11

Function [[ first goes through all elements trying for exact match, then tries to do partial match. The $ function tries both exact and partial match on each element in turn. If you execute:

system.time (
    for (i in 1:10000) {
     a.long.list[["something9973", exact=FALSE]]
     }
)

i.e., you are running a partial match where there is no exact match, you will find that $ is in fact ever so slightly faster.

answered May 24 '13 at 11:17

Bojan Nikolic

1,296
12
8

I think this answers Flodel's clarifying question about why adding exact = FALSE doesn't degrade performance. Anyway I'm now convinced that in programming contexts where speed matters it is going to be better to use [[ unless there is a high probability of needing partial matching (which more often creates bugs in my programs than solves problems). – Jon M May 29 '13 at 21:52
1

BTW if looking for >100x performance for a 10000 element list, then convert list with `as.environment(a.long.list)` and perform lookup on that. Environments are implemented as hash-maps which have near constant lookup time. Linear lists lookups gets proportionally slower with size (how far down in the list elements are). – Soren Havelund Welling Sep 08 '22 at 10:58

R: Why is the [[ ]] approach for subsetting a list faster than using $?

1 Answers1

Linked