How does lapply
extract sub-elements from a list? More specifically, how does lapply
extract sub-elements from a list of lists versus a list of vectors? Even more specifically, suppose I have the following:
my_list_of_lists <- list(list(a = 1, b = 2), list(a = 2, c = 3), list(b = 4, c = 5))
my_list_of_lists[[1]][["a"]] # just checking
# [1] 1
# that's what I expected
and apply the following:
lapply(my_list_of_lists, function(x) x[["a"]])
# [[1]]
# [1] 1
#
# [[2]]
# [1] 2
#
# [[3]]
# NULL
So lapply
extracts the a
element from each of the 3 sublists, returning each in its own list, contained in the length=3 list. At this point, my mental model is the following: lapply
applies FUN
to each element of my_list
, returning FUN(my_list[[i]])
for i
in 1:3
. Great! So I expect my mental model should work for lists of vectors as well. For example,
my_list_of_vecs <- list(c(a = 1, b = 2), c(a = 2, c = 3), c(b = 4, c = 5))
my_list_of_vecs[[1]][["a"]] # Just checking
# [1] 1
# that's what I expected
and apply the following:
lapply(my_list_of_vecs, function(x) x[["a"]])
# Error in x[["a"]] : subscript out of bounds
# Wait...What!?
What's going on here!? Shouldn't this just work? I found a section in help(lapply)
which might be relevant:
For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g., bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[i]], ...), with i replaced by the current (integer or double) index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required to ensure that method dispatch for is.numeric occurs correctly.
I really don't know how to make sense of this.
I think it's related to the fact that you can use both [[
and [
extraction of single elements from a vector but you can ONLY use [
extraction of ranges of elements. For example,
my_list_of_vecs[[1]][1:2]
# a b
# 1 2
my_list_of_vecs[[1]][[1:2]]
# Error in my_list_of_vecs[[1]][[1:2]] :
# attempt to select more than one element in vectorIndex
So under the hood, lapply
must be using function(x) x[["a"]]
over a range. Is that right?
Debugging doesn't help me here since these functions rely on .Internal
functions.