I don't really know about your double-for
loop, but there are a couple of much-more-efficient ways to approach this type of problem.
Vectorization is something that R does very well. So much better, in fact, that the brute-force methods that are natural in some languages can still work in R but are significantly slower.
Side note: R's for
loops used to be less efficient than they are now, so many people still strongly discourage their use in favor of functions from the apply
family. Two points: that fact is no longer true; and that is a different type of looping construct than I'm talking about here. So when I discourage for
loops in this case, it is in favor of vectorizing the math, not apply
ing it.
Here's some data:
my_list <- list(
c(1, 12, 23, 34, 38),
c(2, 12, 21, 38, 47, 56, 71),
c(14, 22, 81, 88, 91, 94)
)
I'll demonstrate on a single vector of this list:
v <- my_list[[1]]
v
I interpret what you said as v[i+1] - v[i]
for each i
in sequence of indices (except 1, since v[0]
is not defined in R). To do this as a vector, this is "start with all numbers except the first, then subtract all numbers except the last".
v[-1]
# [1] 12 23 34 38
v[-length(v)]
# [1] 1 12 23 34
v[-1] - v[-length(v)]
# [1] 11 11 11 4
This is effectively
c(12, 23, 34, 38) - c(1, 12, 23, 34)
c(12-1, 23-12, 34-23, 38-34)
Now that we know how to do this efficiently once, let's streamline that operation and map it to each vector within the list. R does have a function that does this for us:
diff(v)
# [1] 11 11 11 4
but in case your future needs include more specific (non-general) operations, we could write our own function for this specific operation:
my_func <- function(vec) vec[-1] - vec[-length(vec)]
Now here is a classic use of one of the mapping functions: lapply
applies a single function to each element of a list
, and returns a same-length list
with the return values.
Side note: when I need to decide between for
and lapply
(for instance), I ask myself if I care about the calculation on each element (such as this case, where I want the diff
of the vector), or if I'm just interested in the side-effect (e.g., plotting something, saving files). If the former, then lapply
or its kin is appropriate; if the latter, often for
loops. This is not a 100% heuristic, but it's generally pretty good.
lapply(my_list, my_func)
# [[1]]
# [1] 11 11 11 4
# [[2]]
# [1] 10 9 17 9 9 15
# [[3]]
# [1] 8 59 7 3 3
(Similarly, lapply(my_list, diff)
works.) There are similar *apply*
functions with slightly different benefits, requirements, and limitations. (There are also several tutorials that already go into it, and SO is not intended to be a tutorial-site.)
I really do discourage the use of for
loops here, partly for lapply
, partly for vectorization, but to help you understand why your implementation did not work:
- if you need to iterate over each element of a list:
- it is preferred to not hard-code
1:29
, instead use something that depends on the vector itself, such as length(my_list)
, so 1:length(my_list)
might seem appropriate (as you correctly use in your second loop), but ...
- it has happened that this list at some point is of length 0, but
for (i in 1:0)
does not do what one would hope. To be clear, I would hope that it would do nothing, but 1:0
resolves into a vector, length 2, values 1 and 0 (and this is just wrong in most cases that use this flow control). I recommend replacing for (i in 1:length(my_list))
with for (i in seq_along(my_list))
or for (i in seq_len(length(my_list)))
(seq_along
provides indices along a vector/list, it will give no numbers if its list is length 0; and seq_len
smartly gives a 0-length vector if its argument is 0. Both can be found in ?seq
.)
- when
i
is 1 and j
is 2, you store list(12-1)
in res[1]
; when j is 3, you overwrite res[1]
with list(23-12)
, so you've lost your previous calculations in vector 1. This is why each element in your list is length 1.
- your inner loop (
j
) is going all the way to the end of a vector (length(my_list[[i]])
); at this point, my_list[[i]][j+1]
is pointing beyond the end of the vector, so it is resolving to NA
(try my_list[[1]][999999]
), which is why all values in res
are NA
. To fix this, either use 1:(length(my_list[[i]])-1)
or preferably seq_length(my_list[[i]])[-1]
to drop the first (so we'll do (j) - (j-1)
instead of (j+1) - (j)
).
- If you must preserve the
(j+1) - (j)
indexing logic, then use something like seq_along(my_list[[i]])[-length(my_list[[i]])]
or head(seq_along(my_list[[i]]),n=-1)
, where n=-1
means all but the last one.
This is a corrected version of your code:
resouter <- list()
for (i in seq_along(my_list)) {
resinner <- numeric(0)
for (j in seq_along(my_list[[i]])[-1]) {
resinner[j] <- my_list[[i]][j] - my_list[[i]][j-1]
}
resouter[[i]] <- resinner[-1] # since j starts at 2, first one is always NA
}
resouter
# [[1]]
# [1] 11 11 11 4
# [[2]]
# [1] 10 9 17 9 9 15
# [[3]]
# [1] 8 59 7 3 3
But I think that lapply(my_list, my_func)
or even lapply(my_list, diff)
are much more succinct (and faster).