1

I have been searching in vain to find a solution to efficiently combine specific columns from separate data frames or from separate objects within the same list into new data frames. I have 5 data frames that all have the same row names (e.g. x, y, z) and the same column names (V1, V2, V3,...V1000).

   V1   V2   V3   V4...      V1    V2   V3   V4...
x   1    5    8    9      x   2     7    9    5
y   2    7    4    8      y   4     6    5    6
z   4    4    5    9      z   3     4    8    7

What I need to do is to extract V1 from all five frames and make a new data frame with those five V1 columns, and then do the same for the remaining V2-V1000 to get something like:

   V1   V1   V1   V1   V1         V2    V2   V2   V2   V2
x   1    5    8    9    5      x   2     7    9    5    5
y   2    7    4    8    8      y   4     6    5    6    4
z   4    4    5    9    7      z   3     4    8    7    7

Eventually, I need to apply another function to all 1000 of the newly created data frames so if there is a way to loop this merging process and place the 1000 new frames into a new set of frames, that would be ideal.

I have tried various forms of merge, cbind, sapply and other solutions I have found suggested here and elsewhere on the web. The best I've been able to come up with is getting all five data frames into a single list and then using sapply to merge V1 from each object using sapply(y, "[[",2) where y is the list and 2 corresponds to the V1 column in each object. However, I can't seem to get this looped. I could use this code to manually create the 1000 new data frames but that would take forever.

Any suggestions or directions to other answers that might work would be appreciated!

zc1
  • 13
  • 4

1 Answers1

2

We can loop through the column names and extract the columns from the 'data.frame's and cbind it.

nm1 <- paste0("V", 1:1000)
lst <- lapply(nm1, function(x) 
           cbind(df1[x], df2[x], df3[x], df4[x], df5[x]))

The output is a list of data.frames. It is better to keep them as a list instead of creating separate objects in the global environment. But if we need so,

 list2env(setNames(lst, paste0("df_new", seq_along(lst))),
               envir = .GlobalEnv)

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:10, 3*1000, 
 replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
  paste0("V", 1:1000))) )
df2 <- as.data.frame(matrix(sample(0:10, 3*1000, 
 replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
     paste0("V", 1:1000))) )
df3 <- as.data.frame(matrix(sample(0:10, 3*1000,
   replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
   paste0("V", 1:1000))) )
df4 <- as.data.frame(matrix(sample(0:10, 3*1000, 
   replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'),
   paste0("V", 1:1000))) )

 df5 <- as.data.frame(matrix(sample(0:10, 3*1000, 
    replace=TRUE), ncol=1000, dimnames=list(c('x', 'y', 'z'), 
   paste0("V", 1:1000))) )
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks, this does get me the 1000 new desired data frames into a list. However, I'm having trouble figuring out how to apply a function across each object the list. The function is called phyl.pca. If I pull an individual object out and specify it as a data frame it will run, but it won't seem to run across them all within the list. – zc1 Jan 18 '16 at 17:14
  • Thanks, I guess the issue is that the objects in lst have class "list" and phyl.pca doesn't like that. If I manually change one of them to a data frame it does run. I have played around with how to change all of the objects in lst to data.frame, but haven't figured out a solution that works. – zc1 Jan 18 '16 at 20:18
  • @zc1 The `lst` is a `list` Inside the `lst` it is all `data.frame`. You can check it by `lapply(lst, class)` I am not sure whether you tried `lapply(lst, phyl.pca)`. Without showing the function, it is hard to comment. – akrun Jan 19 '16 at 03:02
  • Thanks! Got it to work. Still a little slow in figuring these things out. I think there was an issue with how I specified the function, but it is working now! – zc1 Jan 19 '16 at 20:58