1

I´ve got some sample data

data1 = data.frame(name = c("cat", "dog", "parrot"), freq = c(1,2,3))
data2 = data.frame(name = c("Cat", "snake", "Dog", freq2 = c(2,3,4)))
data1$name = as.character(data1$name)
data2$name = as.character(data2$name)

which I want to join, but e.g. "cat" and "Cat" should be treated as the same value. I thought of using tolower and first to determine the entries which appear in both data frames by

in_both = data1[(tolower(data1$name) %in% tolower(data2$name)),]

Then I want to join with data2, but that doesn't work because the names doesn't match.

library(dplyr)
left_join(in_both, data2)

Is there a way to join by using tolower?

WinterMensch
  • 643
  • 1
  • 7
  • 17
  • 3
    Why not cleaning `data2` before performing join ? e.g. `data2$name <- tolower(data2$name)` and `merge(data,data2,by = "name", all.x = T)` ? – AshOfFire Apr 26 '18 at 10:44

2 Answers2

1

Why not create a dplyr function which would lower the name of left data.frame and perform merge.

With the custom function, you get more control and you wouldn't have to repeat many steps.

f_dplyr <- function(left,right){
  left$name <- tolower(left$name)
  inner_join(left,right,by="name")
}

f_dplyr(data2, data1)

Result

  name freq2 freq
  cat     2    1
  dog     4    2
Rana Usman
  • 1,031
  • 7
  • 21
  • This is solid, but one potential issue with this is that you may want to retain some of the non-lowercase values in the column you join on. You may want the final output of the `name` column to be `Cat` and `Dog`. – ForceLeft415 Jan 24 '20 at 00:08
0

If you don't want to alter your original data2, as @AshofFire suggested, you can decapitalize the values in name in a pipe %>% and then perform the join operation:

data2 %>%
  mutate(name = str_to_lower(name)) %>%
  inner_join(data1, by = "name") 

  name freq2 freq
1  cat     2    1
2  dog     4    2
tifu
  • 1,352
  • 6
  • 17