4

As an example, I have a large list of vectors with various lengths (and some NULL) and would like to find the first list element with two elements. As in this post, I know that with a list you can use a similar approach by using sapply() and subsetting the first result. As the solution in the post linked above using match() doesn't work in this case, I'm curious if there is a more elegant (and more computationally efficient) way to achieve this.

A reproducible example


# some example data
x <- list(NULL, NULL, NA, rep("foo", 6), c("we want", "this one"),
           c(letters[1:10]), c("foo", "bar"), NULL)
x

# find the first element of length 2 using sapply and sub-setting to result #1
x[sapply(x, FUN=function(i) {length(i)==2})][[1]]

Or, as in @Josh O'Brien's answer to this post,

# get the index of the first element of length 2
seq_along(x)[sapply(x, FUN=function(i) {length(i)==2})]

Any thoughts or ideas?

Community
  • 1
  • 1
Cotton.Rockwood
  • 1,601
  • 12
  • 29
  • It looks like @Robert Krzyzanowski's solution is the most efficient by far. I'm not sure why the `match()` solution is slow, but its actually the worst one. Go figure! Benchmarking comparison below for a single list of 200,000 elements. Thanks guys for all the responses. – Cotton.Rockwood Aug 03 '14 at 01:43

4 Answers4

6

Do you want this?

Find(function(i) length(i) == 2, x) # [1] "we want"  "this one"
Position(function(i) length(i) == 2, x) # [1] 5
Robert Krzyzanowski
  • 9,294
  • 28
  • 24
3

mapply seems to be really quick

> x <- rep(x, 25000)
> microbenchmark({ x[match(2, mapply(length, x))] })
# Unit: milliseconds
#       min       lq   median       uq      max neval
#  243.7502 275.8941 326.2993 337.9221 405.7011   100

also check x[mapply(length, x) == 2][[1]]

Here's a different way with sapply

>  x[sapply(x, length) == 2][[1]]
# [1] "we want"  "this one"

This next one is interesting.

> x[ grep("2", summary(x)[,1])[1] ]
# [[1]]
# [1] "we want"  "this one"
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Excellent! It looks like there are a lot of good ways to do this. This is a nice solution, but is really a less verbose version of mine. Given that efficiency is a concern (I'm doing this for many lists that are each 10,000+ items long), it may not be as good. I'd like to have the operation stop as soon as the first matching element is reached. Thanks for the answer, though! – Cotton.Rockwood Aug 03 '14 at 01:07
  • 1
    Check out `summary(x)`, it has a `lengths` column. – Rich Scriven Aug 03 '14 at 01:13
  • interesting regarding 'summary()` having the `Lengths` column... a good nugget to tuck away in the back of your mind. The function using that method is very slow, though at about 681,000 microseconds on the benchmark. +1 for creativity, though! – Cotton.Rockwood Aug 03 '14 at 03:09
  • You don't need the `{}s` around the microbenchmark expression FYI. – Robert Krzyzanowski Aug 03 '14 at 03:48
  • I think the microbenchmark you did for `mapply` must have been for a much smaller list. See the updated benchmarks below. – Cotton.Rockwood Aug 03 '14 at 04:20
  • Wow... if that difference if from the machine specs, I need to think about a new computer! I thought mine was decent, but... Just to be sure, you ran it on rep(x, 25000)? – Cotton.Rockwood Aug 03 '14 at 16:36
  • That makes more sense now... right in line with the benchmark results I got. Note units for yours is `milliseconds` while mine is `microseconds` – Cotton.Rockwood Aug 04 '14 at 16:40
  • That's because there's a faster answer in yours – Rich Scriven Aug 04 '14 at 16:42
3

I ran benchmarking on each of the solutions suggested for a single list of 200,000 elements (28.8 Mb) made from rep(x, 25000). This was just the x list from my example repeated many times. Here are the results:

> microbenchmark(Find(function(i) length(i) == 2, x),
                  x[sapply(x, length) == 2][[1]],
                  x[sapply(x, FUN=function(i) {length(i)==2})][[1]],
                  x[[match(2,lapply(x,length))]],
                  x[match(2, mapply(length, x))],
                  x[mapply(length, x) == 2][[1]])
Unit: microseconds
                                                        expr        min         lq      median          uq        max neval
                   Find(function(i) length(i) == 2, x)     89.104    107.531    112.8955    119.6605    466.045   100
                        x[sapply(x, length) == 2][[1]] 166539.621 185113.274 193224.0270 209923.2405 378499.180   100
x[sapply(x, FUN = function(i) {length(i) == 2 })][[1]] 279596.600 301976.512 310928.3845 322857.7610 484233.342   100
                      x[[match(2, lapply(x, length))]] 378391.882 388831.223 398639.1430 415137.0565 591727.647   100
                        x[match(2, mapply(length, x))] 207324.777 225027.221 235982.9895 249744.3525 422451.010   100
                        x[mapply(length, x) == 2][[1]] 205649.537 223045.252 236039.6710 249529.5245 411916.734   100

Thanks for the quick and informative responses!

Cotton.Rockwood
  • 1,601
  • 12
  • 29
1

Using match can work.

match(2,lapply(x,length))
#[1] 5
x[[match(2,lapply(x,length))]]
#[1] "we want"  "this one"
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I think like @Robert Krzyzanowski's response, this should be more efficient... I will look at more detail on how `match()` works. Thanks, @thelatemail! – Cotton.Rockwood Aug 03 '14 at 01:10