Complete self promotion, but I have written an R package, zoomerjoin, which uses MinHashing, allowing you to fuzzily join large datasets without having to compare all pairs of rows between the two dataframes. This means that you can merge moderately-sized (millions of rows) dataframes in seconds or minutes on a modern data-science laptop without running out of memory.
The example you give will still run slowly, as there are a lot of "correct" matches to check (each row in df_1 will match to thousands of rows in df_2) but in most use cases, the package will allow you to fuzzily match dataframes with millions of rows together.
Here's how i would use the package to join two smaller versions of the example dataframes you have provided:
library(tidyverse)
library(fuzzyjoin)
# Must have the Rust compiler installed to install this package
# see https://github.com/beniaminogreen/zoomerjoin for more details
devtools::install_github("beniaminogreen/zoomerjoin")
library(zoomerjoin)
n <- 500
id_1 <- 1:n
names_1 <- sample(c("hannah", "marcus", "fred", "joe", "lara"), n, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
df_1 <- data.frame(id_1, names_1)
df_1$numberFound <- NA
id_2 <- 1:n
names_2 <- sample(c("hannah", "markus", "paul", "mary", "olivia"), n, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
anyNumber <- sample(c(123, 234, 345, 456, 567), n, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
df_2 <- data.frame(id_2, names_2, anyNumber)
joined_df_s <- lsh_inner_join(df_1, df_2, by=c("names_1" = "names_2"), n_gram_width = 1, threshold = .5)
The package is still in development (although its outputs have been tested against fuzzyjoin's for accuracy), so please don't hesitate to file an issue on github if there are areas of the documentation that are unclear, or if you find any bugs.