I need to match on addresses from two different data frames, which are a bit messy. The street number needs to be an exact match (because being one character off here is a completely different house), but I want to allow for errors in the street name. I've seen fuzzy match examples (like this answer), but I'm not sure how to pair it with an exact match as well.
I know I could match it fuzzy on the street name, and then drop if the street numbers aren't matches, but the full datasets that I am using have over a million observations and tons of similar addresses on the same street. So, I don't know if that's the best way to go about it. So my questions are twofold: can I match both on the fuzzy street name, and exact on the street number at the same time, and is that computationally better than matching on the street name and then dropping if the street numbers don't match?
ID1 = c("1A", "1B", "1C", "1D", "1E", "1F", "1G")
StreetNum = c("123", "234", "235", "492", "231", "492", "231")
StreetName = c("Mulburry St", "Willow St", "Willow St", "Basin st", "Farrow st", "Little st", "Little st")
df1 = data.frame(ID1, StreetNum, StreetName)
ID2 = c("2A", "2B", "2C", "2D", "2E", "2F", "2G")
StreetNum = c("123", "234", "235", "492", "231", "492", "231")
StreetName = c("Mulbury St", "Willow St", "Wilow St", "Basin st", "Farraow st", "Ltitle st", "Little st")
df2 = data.frame(ID2, StreetNum, StreetName)
example <- stringdist_left_join(df1,df2,by=c("StreetName"),max_dist=2)
example <- subset(example, StreetNum.x == StreetNum.y)