Python difflib: sequence similarity above cutoff point, but no result on get_close_matches()

Question

So i'm using difflib to find same streets written down in different formats. Here's the one pair that really bugs me: '1-й Лихачевский переулок' and 'Переулок Лихачевский 1-й'.

I calculate the sequence similarity like this:

s = difflib.SequenceMatcher(None, "1-й Лихачевский переулок", "Переулок Лихачевский 1-й")
s.ratio()

Gives me result of 0.5416666666666666. Good enough, eh? But okay, default cutoff point for get_close_matches() is 0.6, so i do this:

difflib.get_close_matches('1-й Лихачевский переулок', 'Переулок Лихачевский 1-й', cutoff=0.5)

No results! In fact, there's no results even if i set cutoff to 0.1.

What am i missing?

score 0 · Accepted Answer · answered Jul 02 '17 at 19:01

0

The second argument to get_close_matches() is a sequence of strings to match against, not an individual string. So, e.g., pass a list:

>>> difflib.get_close_matches('1-й Лихачевский переулок', ['Переулок Лихачевский 1-й'], cutoff=0.5)
['Переулок Лихачевский 1-й']

As is, you passed a string, which is treated as a sequence of individual characters.

answered Jul 02 '17 at 19:01

Tim Peters

67,464
13
126
132

Welp, that explains everything. Thanks! – Huita Jul 02 '17 at 19:04
You're welcome :-) If your question has been answered, you should "accept" the answer you like best: https://stackoverflow.com/help/someone-answers – Tim Peters Jul 02 '17 at 19:06

Python difflib: sequence similarity above cutoff point, but no result on get_close_matches()

1 Answers1