8

I can join two datasets that contain two variables with different names using dplyr::left_join(..., by = c("name1" = "name2").

I want to join using character objects, left_join(..., by = c(nameOb1 = nameOb2). Oddly: this works for by = c("name1", nameOb2), but not for by = c(nameOb1, "name2").

Why is this?

Replication of my issue below. Many thanks.

Generate data

    orig <- tibble(name1 = c("a", "b", "c"),
                   n     = c(10, 20, 30))  

    tojoin <- tibble(name2 = c("a", "b", "c"),
                     pc    = c(.4, .1, .2))    

Works: using character strings for the by arguments

    left_join(orig, tojoin, by = c("name1" = "name2"))

    # A tibble: 3 x 3
      name1     n    pc
      <chr> <dbl> <dbl>
    1 a        10   0.4
    2 b        20   0.1
    3 c        30   0.2

Does not work: using object as the character string for the first by argument

    firstname <- "name1"

    left_join(orig, tojoin, by = c(firstname = "name2"))

    # Error: `by` can't contain join column `firstname` which is missing from LHS
    # Call `rlang::last_error()` to see a backtrace

Works: using object as the character string for the second by argument

    secondname <- "name2"

    left_join(orig, tojoin, by = c("name1" = secondname))

    # A tibble: 3 x 3
      name1     n    pc
      <chr> <dbl> <dbl>
    1 a        10   0.4
    2 b        20   0.1
    3 c        30   0.2

Packages:

dplyr 0.8.0.1

wfmackey
  • 83
  • 1
  • 4
  • 1
    I'm not sure but I think it has to do with this: `# Note that only the key from the LHS is kept`. Now since `firstname` isn't available in LHS, you need to devise some mechanism to match it. My efforts have been futile. – NelsonGon Feb 22 '19 at 10:06
  • Not an answer to your question but a workaround: `left_join(orig, tojoin, by = 'names<-'("name2", firstname))` – markus Feb 22 '19 at 10:17

1 Answers1

5

Hy, the 'left_join' function needs a named character vector in the by argument. In your second try:

firstname <- "name1"
left_join(orig, tojoin, by = c(firstname = "name2"))

You set the name of the character vector to firstname which does not work for the join. For solving this you can first generate a named character vector and pass it then to the by argument of the join function

firstname <- "name1"
join_cols = c("name2")
names(join_cols) <- firstname

dplyr::left_join(orig, tojoin, by = join_cols)
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Freakazoid
  • 490
  • 3
  • 10
  • 3
    The question is **why** does it not work for `firstname` but works for `secondname` – Sotos Feb 22 '19 at 09:56
  • My first sentence answer this question! – Freakazoid Feb 22 '19 at 09:57
  • You set a wrong character vector. By initialize a character vector with `c(firstname = "name2")` the names of your vector entries are not the string which is set in the variable `firstname`. In your second example, you set the variable `secondname` as the vector entire, this means that in the character vector, the value is the string which is set in your `secondname` variable. You set the name of this entry with the string "name2". Hope you get the point. – Freakazoid Feb 22 '19 at 10:08
  • 1
    Thanks for the solution @Freakazoid -- that works, and I now understand why, – wfmackey Feb 22 '19 at 13:02