3

The problem

Hi all,

I am trying to join a few dataframes together dynamically. For me that means that I have a dataframe that I start with df_A, to which I want to join multiple other dataframesdf_B1, df_B2,df_B3, etc..

df_A contains a column for each of the df_B... tables to join against. Column_join_B1, Column_join_B2, Column_join_B3, etc. (Although in reality these have obscure names). These names are also in a vector df_A_join_names.

df_B1, df_B2, df_B3, etc.. are stored in a list df_B, which I understand is good practice to do :). This is also how I access them in my loop.

Each of these has two columns. One with the value to join against df_A The other with information.

I even tried renaming the first column to match the column in df_A before the join, but to no avail.

What I am trying

left_join() does not allow me simply use by = c(df_A_join_names[1], "Column_join_A") so I have to use setNames, but I cannot get this to work.

Below a function which I want to iterate in a loop:

my_join <- function(df_a, df_b, a_name, b_name){ 

  df_joined <- left_join(df_a, df_b, 
                            by = setNames(b_name, a_name))
  return(df_joined)
}

I want to use this function in a loop to join all my df_B... dataframes against df_A.

for (i in 1: length(df_A_join_names)){

    df_A <- my_join(df_a = df_A,
                    df_b = df_B[i],
                    a_name = as.character(df_A_join_names[i]),
                    b_name = "Column_join_A"
    )
}

Running this I get:

 Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "list" 

Some stuff to play with

#Making df_A
A_a <- seq(1,10, by = 1)
Column_join_B1 <- seq(11,20, by = 1)
Column_join_B2 <- seq(21,30, by = 1)
df_A <- data.frame(cbind(A_a, Column_join_B1, Column_join_B2) )

#Making df_B
Column_join_A <- seq(11,20, by = 1)
B_a <- LETTERS[1:10]
df_B1 <- data.frame(Column_join_A, B_a )

Column_join_A <- seq(21,30, by = 1)
B_b <- LETTERS[11:20]
df_B2 <- data.frame(Column_join_A, B_b)

# In my own code I make this using a loop. maybe not the prettiest.
df_B <- list()
df_B[[1]] <- df_B1
df_B[[2]] <- df_B2

df_A_join_names <- c("Column_join_B1", "Column_join_B2")

References

I'm trying to apply this:

Dplyr join on by=(a = b), where a and b are variables containing strings?

I'm curious to hear what you guys think!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Do you want to join all datasets in `df_B` into `df_A` to generate one new data? Or join each dataset in `df_B` into `df_A` respectively to get multiple data? – Darren Tsai Nov 14 '19 at 15:41
  • I was looking to join all datasets in ```df_B``` to ```df_A``` to got one single dataframe. – tuliptrader Nov 15 '19 at 07:41

3 Answers3

1

You were very close! The only thing you might need to change is the way you reference the data frame under list df_B. df_B[1] will still be a list, df_B[[1]] will return a data frame. I ran the code below and it worked for me.

for (i in 1: length(df_A_join_names)){

  df_A <- my_join(df_a = df_A,
                  df_b = df_B[[i]],
                  a_name = as.character(df_A_join_names[i]),
                  b_name = "Column_join_A"
  )
}
1

There's no need for building a specific function, you can simply use SetNames within left_join function:

df_B_join_name <- "Column_join_A"

for (i in 1: length(df_A_join_names)){
  df_A <- left_join(df_A, df_B[[i]], by=c(setNames(nm = df_A_join_names[i], df_B_join_name)))
}   
MarkusN
  • 3,051
  • 1
  • 18
  • 26
1

First, manage to rename the first column in df_B to match the column in df_A. So df_B will look like this:

# [[1]]
#    Column_join_B1 B_a
# 1              11   A
# 2              12   B
# .              .    .
# .              .    .
# .              .    .
# 
# [[2]]
#    Column_join_B2 B_b
# 1              21   K
# 2              22   L
# .              .    .
# .              .    .
# .              .    .

Next, use Reduce() in base or reduce() in purrr to iterate the manipulation of left_join. You even don't need to use the for loop.

Reduce(left_join, df_B, init = df_A)

#    A_a Column_join_B1 Column_join_B2 B_a B_b
# 1    1             11             21   A   K
# 2    2             12             22   B   L
# 3    3             13             23   C   M
# 4    4             14             24   D   N
# 5    5             15             25   E   O
# 6    6             16             26   F   P
# 7    7             17             27   G   Q
# 8    8             18             28   H   R
# 9    9             19             29   I   S
# 10  10             20             30   J   T
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51