18

Is there a way to ignore case sensitvity when doing joins using dplyr? left, inner, and full?

I see it works with select but this often a huge pain for me. I know I can convert the columns toupper or tolower before hand, but this would be a helpful work around.

runningbirds
  • 6,235
  • 13
  • 55
  • 94
  • Can we have a example please. Is this what you meant? https://gist.github.com/jimhester/a060323a05b40c6ada34 – Alex May 20 '16 at 06:42
  • 1
    Maybe check the fuzzyjoin package? It is a variation on dplyr joins and some of the functions there seem related to this. – aosmith May 20 '16 at 16:57

2 Answers2

14

I don't think there's a straightforward way to get around using tolower or toupper to tidy the data first. That said, an inline mutate (inside the join) would leave the original data untouched if that is preferred.

X %>% left_join(Y %>% mutate(id = tolower(id)), by = "id")

It works, but we might as well have created a tidy Y with ids to match X in the first place (in my opinion).

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
Damian
  • 1,385
  • 10
  • 10
  • 3
    rather than mess with mutating individual columns, I'd probably do something like `...%>% setNames(., tolower(names(.))) %>% ...` to get them all – Gregor Thomas Jul 19 '17 at 21:41
10

dplyr does not have such an option yet, but with fuzzyjoin you can do this very easily:

require(fuzzyjoin)
regex_inner_join(x,y,by="id", ignore_case =TRUE)

if you have different column names in x and y:

regex_inner_join(x,y,by=c("xid"="yid"), ignore_case =TRUE)
Ibo
  • 4,081
  • 6
  • 45
  • 65