0

Here's some code that generates a list of data.frames and then converts that original list into a new list with each list element a list of the rows of each data frame.

Eg.
- l1 has length 10 and each element is a data.frame with 1000 rows.
- l2 is a list of length 1000 (nrow(l1[[k]])) and each element is a list of length 10 (length(l1)) containing row-vectors from the elements of l1

l1 <- vector("list", length= 10)
set.seed(65L)
for (i in 1:10) {
  l1[[i]] <- data.frame(matrix(rnorm(10000),ncol=10))
}

l2 <- vector(mode="list", length= nrow(l1[[1]]))
for (i in 1:nrow(l1[[1]])) {
  l2[[i]] <- lapply(l1, function(l) return(unlist(l[i,])))
}

Edit To clarify how l1 relates to l2, here is language agnostic code.

for (j in 1:length(l1) {
  for (i in 1:nrow(l1[[1]]) { # where nrow(l1[[1]]) == nrow(l1[[k]]) k= 2,...,10
    l2[[i]][[j]] <- l1[[j]][i,]
  }
}

How do I speed the creation of l2 up via vectorization or parallelization? The problem I'm having is that parallel::parLapplyLB splits lists; however, I don't want to split the list l1, what I want to do is split the rows within each element of l1. An intermediate solution would vectorize my current approach by using some *apply function to replace the for-loop. This could obviously be extended to a parallel solution as well.

If I solve this on my own before an acceptable solution, I'll post my answer here.

alexwhitworth
  • 4,839
  • 5
  • 32
  • 59
  • Don't assign to the global environment first, add to the list, and then remove from global environment. Assign straight to the list (in your for loop `l1[[i]] <- ...`) or just `l1 = replicate(10, data.frame(matrix(rnorm(10000), ncol = 10)), simplify = F)` – Gregor Thomas Mar 11 '16 at 17:17
  • 1
    Are your real dataframes with the same number of columns ? – Tensibai Mar 11 '16 at 17:21
  • @Tensibai each element of `l1` is guaranteed to have the same number of rows but not the same number of columns – alexwhitworth Mar 11 '16 at 17:22
  • I don't exactly get what you're trying to get at end, is it a transposition of each df in l1 ? – Tensibai Mar 11 '16 at 17:24
  • @Tensibai I'm splitting each element of `l1` into its component rows. So, rather than having a list of length 10 where each element has 1000 rows, I have a list of length 1000, where each element is a list of length 10 – alexwhitworth Mar 11 '16 at 17:28
  • Right, just a comment, not an answer. – Gregor Thomas Mar 11 '16 at 17:29
  • 1
    @Alex what I didn't get is, what's in the list of length 10, 10 vectors of 1000 values from the l1 df columns ? – Tensibai Mar 11 '16 at 17:32
  • Ok got it, each l2 entry is a list of the same row index for each df in l1... i.e: `l2[[2]][[3]] <- l1[[3]][2,]` (right ?) – Tensibai Mar 11 '16 at 17:43
  • @Tensibai Yes-- your most recent comment. I can clarify in the post – alexwhitworth Mar 11 '16 at 17:44

1 Answers1

1

I would break the structure completely and rebuild the second list via split. This approach needs much more memory than the original one but at least for the given example it is >10x faster:

sgibb <- function(x) {
  ## get the lengths of all data.frames (equal to `sapply(x, ncol)`)
  n <- lengths(x)
  ## destroy the list structure
  y <- unlist(x, use.names = FALSE)
  ## generate row indices (stores the information which row the element in y
  ## belongs to)
  rowIndices <- unlist(lapply(n, rep.int, x=1L:nrow(x[[1L]])))
  ## split y first by rows
  ## and subsequently loop over these lists to split by columns
  lapply(split(y, rowIndices), split, f=rep.int(seq_along(n), n))
}

alex <- function(x) {
  l2 <- vector(mode="list", length= nrow(x[[1]]))
  for (i in 1:nrow(x[[1]])) {
    l2[[i]] <- lapply(x, function(l) return(unlist(l[i,])))
  }
  l2
}

## check.attributes is need because the names differ
all.equal(alex(l1), sgibb(l1), check.attributes=FALSE)

library(rbenchmark)
benchmark(alex(l1), sgibb(l1), order = "relative", replications = 10)
#       test replications elapsed relative user.self sys.self user.child sys.child
#2 sgibb(l1)           10   0.808    1.000     0.808        0          0         0
#1  alex(l1)           10  11.970   14.814    11.972        0          0         0
sgibb
  • 25,396
  • 3
  • 68
  • 74