I need to compare two unstructured addresses and be able to identify if they are the same (or similar enough).
Scenario
- Address is supplied by the end user in plain text.
- There is nothing to help the user to write on a more identifiable manner (no autocomplete, nothing. Just an empty textbox).
- "#102 Nice-Looking Street, Gotham City, NY" should match with "Nice Loking St., Gotham City, New York, apt 102".
- Using a third-party service is not an option.
- Search is not a problem. I already have the two strings. What I need is to check if they represent the same address, despite its differences on structure.
What I have found
I know we can use some Fuzzy logic for this kind of comparison, with some tolerance for misspelling, but...
- There are some keywords (like, for instance, comparing "Street" to "St." or comparing "#102" to "apt 102", or "NY" to "New York") that are not supposed to penalize the degree of reliability.
- Some words can be placed in different order (like the appartement in the above example).
I do not want to reinvent the Wheel. This problem seems like a common concern in different contexts and I think there is an algorithm (with some slight modifications, maybe) that might be a fit for this scenario.
Thanks in advance