joining addresses - need fuzzy street name match and exact street number match

Question

I need to match on addresses from two different data frames, which are a bit messy. The street number needs to be an exact match (because being one character off here is a completely different house), but I want to allow for errors in the street name. I've seen fuzzy match examples (like this answer), but I'm not sure how to pair it with an exact match as well.

I know I could match it fuzzy on the street name, and then drop if the street numbers aren't matches, but the full datasets that I am using have over a million observations and tons of similar addresses on the same street. So, I don't know if that's the best way to go about it. So my questions are twofold: can I match both on the fuzzy street name, and exact on the street number at the same time, and is that computationally better than matching on the street name and then dropping if the street numbers don't match?

ID1 = c("1A", "1B", "1C", "1D", "1E", "1F", "1G")
StreetNum = c("123", "234", "235", "492", "231", "492", "231")
StreetName = c("Mulburry St", "Willow St", "Willow St", "Basin st", "Farrow st", "Little st", "Little st")

df1 = data.frame(ID1, StreetNum, StreetName)

ID2 = c("2A", "2B", "2C", "2D", "2E", "2F", "2G")
StreetNum = c("123", "234", "235", "492", "231", "492", "231")
StreetName = c("Mulbury St", "Willow St", "Wilow St", "Basin st", "Farraow st", "Ltitle st", "Little st")

df2 = data.frame(ID2, StreetNum, StreetName)

example <- stringdist_left_join(df1,df2,by=c("StreetName"),max_dist=2)
example <- subset(example, StreetNum.x == StreetNum.y)

Fuzzy matching is expensive compared to exact matching. You should be hoping that the algorithm will match exact street numbers first, and then only attempt to fuzzy match those candidates. If you can pose additional exact match restrictions - e.g. maybe the first letter of the street name must be an exact match, that will gain a lot of efficiency if you can do more exact matching up front. — Gregor Thomas, Feb 19 '22 at 18:04

joining addresses - need fuzzy street name match and exact street number match

0 Answers0