0

Apologies, returning to lists and dataframes in R after a while, so have forgotten my way about. Suppose that I have several dataframes in a list:

d2<- data.frame(week=c("12th","13th","14th"),value=c(1,20,100))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,30))
dfList<- list(d1,d2,d3)

dfList
[[1]]
  week value
1 12th     1
2 13th    10
3 14th    15

[[2]]
  week value
1 12th     1
2 13th    20
3 14th   100

[[3]]
  week value
1 12th     1
2 13th   220
3 14th    30

And I would like to have a final dataframe with combined data, the shape of which is as follows

finalDf<- data.frame(week=c("12th","13th","14th"),value1=c(1,20,100),value2=c(1,10,15),value3=c(1,220,30))

  week value1 value2 value3
1 12th      1      1      1
2 13th     20     10    220
3 14th    100     15     30

How could I achieve the above form of data? Also, what if my initial dataframes also have NAs, which I would like to remove prior to achieving the final form of data?

Many, many thanks.

info_seekeR
  • 1,296
  • 1
  • 15
  • 33

5 Answers5

5

I see the cbind strategies, but they may fail if there are missing values, so I thought a merge approach should be illustrated:

 Reduce( function(x,y) merge(x, y, by="week"), dfList)
  week value.x value.y value
1 12th       1       1     1
2 13th      10      20   220
3 14th      15     100    30

If you wanted to keep all the potential NA values then perhaps adding an , all.x=TRUE argument would be needed.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
3
> cbind(dfList[[1]], lapply(dfList[2:3], `[`, "value"))
  week value value value
1 12th     1     1     1
2 13th    10    20   220
3 14th    15   100    30

It seems the numbering of your dataframes is not the same in your data setup and your intended result, but any version of this code (changing the relevant extraction indices in [ and [[) will get you to your intended structure.

Thomas
  • 43,637
  • 12
  • 109
  • 140
2

You could try:

 library(plyr)
 join_all(dfList, by="week")
 #  week value value value
 #1 12th     1     1     1
 #2 13th    10    20   220
 #3 14th    15   100    30

Also, works with NA Using @Frank's data

  res <- join_all(dfList, by="week")
  res
  #  week value value value
  #1 12th     1    NA     1
  #2 13th    10    NA   220
  #3 14th    15    NA    NA

  str(res)
 #'data.frame': 3 obs. of  4 variables:
 # $ week : Factor w/ 3 levels "12th","13th",..: 1 2 3
 #$ value: num  1 10 15
 #$ value: logi  NA NA NA
 #$ value: num  1 220 NA  ##numeric columns
akrun
  • 874,273
  • 37
  • 540
  • 662
2

There are some very nice answers already, but here's another:

Step 1: Combine your data.frames into a long list:

dfDF <- do.call(rbind, dfList)

Step 2: Add a "time" variable that indicates the list from which the data came. There are a few ways to do that....

with(dfDF, ave(as.character(week), week, FUN = seq_along))
# [1] "1" "1" "1" "2" "2" "2" "3" "3" "3"

rep(sequence(length(dfList)), vapply(dfList, nrow, 1L))
# [1] 1 1 1 2 2 2 3 3 3

dfDF$time <- with(dfDF, ave(as.character(week), week, FUN = seq_along))

Step 3: Use dcast to go from "long" to "wide".

library(reshape2)
dcast(dfDF, week ~ time, value.var = "value")
#   week  1   2   3
# 1 12th  1   1   1
# 2 13th 10  20 220
# 3 14th 15 100  30
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1
df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
#  week value value.1 value.2
#1 12th     1       1       1
#2 13th    10      20     220
#3 14th    15     100      30

If you want the value columns to be numeric instead of characters:

df[2:4] <- sapply(df[2:4], as.numeric)
df
#  week value value.1 value.2
#1 12th     1       1       1
#2 13th    10      20     220
#3 14th    15     100      30

Also, works with NAs:

d2<- data.frame(week=c("12th","13th","14th"),value=c(NA,NA,NA))
d1<- data.frame(week=c("12th","13th","14th"),value=c(1,10,15))
d3<- data.frame(week=c("12th","13th","14th"),value=c(1,220,NA))
dfList<- list(d1,d2,d3)

df <- data.frame(t(unique(t(do.call(cbind, dfList)))), stringsAsFactors = FALSE)
df
#  week value value.1 value.2
#1 12th     1    <NA>       1
#2 13th    10    <NA>     220
#3 14th    15    <NA>    <NA>
Jota
  • 17,281
  • 7
  • 63
  • 93