1

I have a list with 29 vectors (each vector with different lengths) like this:

my_list
[1] 1 12 23 34 38 
[2] 2 12 21 38 47 56 71  
 .
 .
[29] 14 22 81 88 91 94   

I need to compute ( i+1 - i ) for each vector of the list (my_list). Example:

my_list
[1] (12-1) (23-12)  (34-23) (38-34)
[2] (12-2) (21-12)  (38-21) (47-38) (56-47) (71-56)
 .
 .
[29] (22-14) (81-22)  (88-81) (91-88) (94-91) 

I tried a for loop:

res <- list()
for(i in 1:29) {
    for(j in 1:length(my_list[[i]])){
        my_res <- list(my_list[[i]][j+1] - my_list[[i]][j])
        res[i] <- my_res

But the result gives only the first value for each vector of the list:

res
[1] 11
[2] 10
 .
 .
[29] 8

There is a way to do it with apply-like functions?

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137

1 Answers1

3

I don't really know about your double-for loop, but there are a couple of much-more-efficient ways to approach this type of problem.

Vectorization is something that R does very well. So much better, in fact, that the brute-force methods that are natural in some languages can still work in R but are significantly slower.

Side note: R's for loops used to be less efficient than they are now, so many people still strongly discourage their use in favor of functions from the apply family. Two points: that fact is no longer true; and that is a different type of looping construct than I'm talking about here. So when I discourage for loops in this case, it is in favor of vectorizing the math, not applying it.

Here's some data:

my_list <- list(
  c(1, 12, 23, 34, 38),
  c(2, 12, 21, 38, 47, 56, 71),
  c(14, 22, 81, 88, 91, 94)
)

I'll demonstrate on a single vector of this list:

v <- my_list[[1]]
v

I interpret what you said as v[i+1] - v[i] for each i in sequence of indices (except 1, since v[0] is not defined in R). To do this as a vector, this is "start with all numbers except the first, then subtract all numbers except the last".

v[-1]
# [1] 12 23 34 38
v[-length(v)]
# [1]  1 12 23 34
v[-1] - v[-length(v)]
# [1] 11 11 11  4

This is effectively

c(12, 23, 34, 38) - c(1, 12, 23, 34)
c(12-1, 23-12, 34-23, 38-34)

Now that we know how to do this efficiently once, let's streamline that operation and map it to each vector within the list. R does have a function that does this for us:

diff(v)
# [1] 11 11 11  4

but in case your future needs include more specific (non-general) operations, we could write our own function for this specific operation:

my_func <- function(vec) vec[-1] - vec[-length(vec)]

Now here is a classic use of one of the mapping functions: lapply applies a single function to each element of a list, and returns a same-length list with the return values.

Side note: when I need to decide between for and lapply (for instance), I ask myself if I care about the calculation on each element (such as this case, where I want the diff of the vector), or if I'm just interested in the side-effect (e.g., plotting something, saving files). If the former, then lapply or its kin is appropriate; if the latter, often for loops. This is not a 100% heuristic, but it's generally pretty good.

lapply(my_list, my_func)
# [[1]]
# [1] 11 11 11  4
# [[2]]
# [1] 10  9 17  9  9 15
# [[3]]
# [1]  8 59  7  3  3

(Similarly, lapply(my_list, diff) works.) There are similar *apply* functions with slightly different benefits, requirements, and limitations. (There are also several tutorials that already go into it, and SO is not intended to be a tutorial-site.)


I really do discourage the use of for loops here, partly for lapply, partly for vectorization, but to help you understand why your implementation did not work:

  • if you need to iterate over each element of a list:
    • it is preferred to not hard-code 1:29, instead use something that depends on the vector itself, such as length(my_list), so 1:length(my_list) might seem appropriate (as you correctly use in your second loop), but ...
    • it has happened that this list at some point is of length 0, but for (i in 1:0) does not do what one would hope. To be clear, I would hope that it would do nothing, but 1:0 resolves into a vector, length 2, values 1 and 0 (and this is just wrong in most cases that use this flow control). I recommend replacing for (i in 1:length(my_list)) with for (i in seq_along(my_list)) or for (i in seq_len(length(my_list))) (seq_along provides indices along a vector/list, it will give no numbers if its list is length 0; and seq_len smartly gives a 0-length vector if its argument is 0. Both can be found in ?seq.)
  • when i is 1 and j is 2, you store list(12-1) in res[1]; when j is 3, you overwrite res[1] with list(23-12), so you've lost your previous calculations in vector 1. This is why each element in your list is length 1.
  • your inner loop (j) is going all the way to the end of a vector (length(my_list[[i]])); at this point, my_list[[i]][j+1] is pointing beyond the end of the vector, so it is resolving to NA (try my_list[[1]][999999]), which is why all values in res are NA. To fix this, either use 1:(length(my_list[[i]])-1) or preferably seq_length(my_list[[i]])[-1] to drop the first (so we'll do (j) - (j-1) instead of (j+1) - (j)).
    • If you must preserve the (j+1) - (j) indexing logic, then use something like seq_along(my_list[[i]])[-length(my_list[[i]])] or head(seq_along(my_list[[i]]),n=-1), where n=-1 means all but the last one.

This is a corrected version of your code:

resouter <- list()
for (i in seq_along(my_list)) {
  resinner <- numeric(0)
  for (j in seq_along(my_list[[i]])[-1]) {
    resinner[j] <- my_list[[i]][j] - my_list[[i]][j-1]
  }
  resouter[[i]] <- resinner[-1] # since j starts at 2, first one is always NA
}
resouter
# [[1]]
# [1] 11 11 11  4
# [[2]]
# [1] 10  9 17  9  9 15
# [[3]]
# [1]  8 59  7  3  3

But I think that lapply(my_list, my_func) or even lapply(my_list, diff) are much more succinct (and faster).

r2evans
  • 141,215
  • 6
  • 77
  • 149