Here is my example:
id <- 1:5
names_1 <- c("hannah", "marcus", "fred", "joe", "lara")
df_1 <- data.frame(id, names_1)
df_1$phonenumberFound <- NA
names_2 <- c("hannah", "markus", "fredd", "joey", "paul", "mary", "olivia")
phone <- c(123, 234, 345, 456, 567, 678, 789)
df_2 <- data.frame(names_2, phone)
What I want to achieve is:
If one of the names in df_2 (at least approximately) matches a name in df_1, then I want to add the corresponding phone number in df_1.
Basically, it's some kind of a fuzzy left join but I have not succeeded to do it.
In fact, my true df_1 has 30.000 rows and my true df_2 has 500.000 rows. Is there a fast way to do this?
Thank you!
EDIT:
I need to change and clarify my example as I'm running into memory issues using the answers provided so far. (I'm using a Windows notebook with 16 GB RAM.)
id_1 <- 1:30000
names_1 <- sample(c("hannah", "marcus", "fred", "joe", "lara"), 30000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
df_1 <- data.frame(id_1, names_1)
df_1$numberFound <- NA
id_2 <- 1:500000
names_2 <- sample(c("hannah", "markus", "paul", "mary", "olivia"), 500000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
anyNumber <- sample(c(123, 234, 345, 456, 567), 500000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
df_2 <- data.frame(id_2, names_2, anyNumber)
Any helpful comments and answers are highly appreciated.