Add columns from different data frames and stack on two indicators

Question

We’d like to merge some columns from a data frame with the matching columns from various different data frames. Our main data frame predict looks as follows:

>predict
 x1    x2    x3
 1     1     1
 0     1     0
 1     1     0
 1     1     0
 0     0     1

(There may be more columns depending on the quantity of prediction runs)

Our goal is to merge this data frame with the y-columns from three different test data frames (df_1 df_2 and df_3) which all have the same structure. The needed columns are accessed through df_1$y[test] ([test] is a logical vector which identifies the 5 values which match our x-values) and have the same structure as the x-columns from predict.

The desired output would look like this:

>predict_test
 x1    x2    x3    y1    y2    y3 
 1     1     1     1     1     1
 0     1     0     0     0     0
 1     1     0     0     1     0
 1     1     0     1     1     1
 0     0     1     0     0     1

In the next step we need to stack the x- and the y- columns into one column in order to do evaluations. It is important to stack them in the correct order, i.e. x2 under x1 and x3 under x2. The y-columns respectively.

>predict_test_stack
 x_all y_all
 1     1
 0     0
 1     0
 1     1
 0     0
 1     1
 1     0
 1     1
 1     1
 0     0
 1     1
 0     0
 0     0
 0     1
 1     1

This probably works with melt, but we don't know how to apply it while indicating two different id variables.

Thanks for your help.

moodymudskipper · Accepted Answer · 2017-08-07T12:21:59.690

1

data

df1 <- read.table(text = "x1    x2    x3
1     1     1
0     1     0
1     1     0
1     1     0
0     0     1",stringsAsFactors = FALSE,header=TRUE)

df2 <- read.table(text = "y1    y2    y3
1     1     1
0     0     0
0     1     0
1     1     1
0     0     1",stringsAsFactors = FALSE,header=TRUE)

solution

we concatenate the data.frames, then unlist the data.frame, keeping the correct number of columns. Finally we set the names by going into the data.frames to find the pattern.

list1 <- list(df1,df2)
side_by_side <- data.frame(list1)
#   x1 x2 x3 y1 y2 y3
# 1  1  1  1  1  1  1
# 2  0  1  0  0  0  0
# 3  1  1  0  0  1  0
# 4  1  1  0  1  1  1
# 5  0  0  1  0  0  1

output <- data.frame(matrix(unlist(side_by_side),ncol = length(list1)))
names(output) <- sapply(list1,function(x){sub("[[:digit:]]","",names(x)[1])})
#     x  y
# 1   1  1
# 2   0  0
# 3   1  0
# 4   1  1
# 5   0  0
# 6   1  1
# 7   1  0
# 8   1  1
# 9   1  1
# 10  0  0
# 11  1  1
# 12  0  0
# 13  0  0
# 14  0  1
# 15  1  1

edited Aug 07 '17 at 12:21

answered Aug 06 '17 at 21:38

moodymudskipper

46,417
11
121
167

Thanks @Moody_mudskipper. I learned some useful basic stuff from that. One more question: what does the `sub` in the sapply command exactly do? – Dima Aug 06 '17 at 22:22
It replaces any digit by the empty string in the first column name of the relevant `data.frame`. I had made a copy paste mistake and the printed names of the output were wrong, I replaced them with x and y now :). – moodymudskipper Aug 07 '17 at 12:23
One more thing I just noticed is that creating the *output* using `unlist` transforms my original x values from (0,1) into (1,2). Any suggestion? – Dima Aug 07 '17 at 22:38
make sure to use the parameter `stringsAsFactors = FALSE` each time you define a `data.frame`, convert to a `data.frame`, or read an external files (`read.csv` etc) – moodymudskipper Aug 07 '17 at 22:41
the quick fix is to use `unlist(as.numeric(as.character(...)))` but try to solve it upstream – moodymudskipper Aug 07 '17 at 22:43
The problem occurs just in the last step; after stacking the columns. `stringsAsFactors = FALSE` unfortunately doesn't help much. The `unlist(as.numeric(as.character(...)))` produces a warning message, saying *NAs introduced by coercion*. Anyway, I just retranform the values back to 0&1. Suggestions for a more elegant solutions are welcome :) – Dima Aug 08 '17 at 23:25

Add columns from different data frames and stack on two indicators

1 Answers1