What is the best distance metric for places

Question

I'm searching for a "good" / easy metric to recognize similar places / user input to avoid crreating duplicates.

Levenstein distance works good for Typos like

bakery

bekerry

(Levenstein Distance: 2)

But "fails" for swapped words

St Ursula School

School St. Ursula

(Levenstein Distance: 17)

and additions

Serious Business

Serious Business Incorporated

Strikes me that you are trying to work out what the place names mean. Probably you need a simple parser to read the names. In real life often "small street, SE1" and "small street, E1" are often confused. I wouldn't expect an automated process to be perfect — Vorsprung, Feb 03 '16 at 16:26

score 0 · Answer 1 · answered Feb 03 '16 at 18:51

0

I think using the raw distance metric will be hard. You probably want to use some NLP methods (nltk) to do ner (named entity recognition), then use that result to compare.

answered Feb 03 '16 at 18:51

Gang Su

1,187
10
12

What is the best distance metric for places

1 Answers1