Specific Approximate Matching in Python

Asked Oct 15 '20 at 00:46

Active Oct 15 '20 at 00:46

Viewed 99 times

PROBLEM

I want to implement a type of specific approximate matching of two sentences in Python.

Example -

s_1 = "I hope you are safe from COVID-19 today"
s_2 = "I hope you're safe from COVID 19 today"

score = get_similarity(s_1, s_2)

s_1 = "I allow account access to facebook"
s_2 = "I allow an account access to face book"

score = get_similarity(s_1, s_2)

APPROACH

I tried using FuzzyWuzzy to get a partial ratio of matching, but I have observed that with that, even if s_2 is "I allow an account access to, without the face book, it will give a high similarity score.

ASK

Is there a better way so that I can take into account a similarity of the entire sentence into consideration?

NOTE - s_2 might or might not be a transcription from a from a video file so will have to account for that delta in getting a precise text. Example, FACEBOOK can be transcribed as FACE BOOK

asked Oct 15 '20 at 00:46

Adhish Thite

in fuzzywuzzy there is fuzz.ratio which will calculate a distance based on the whole sentence. Probably it makes sence to do some preprocessing like e.g. to lowercase both strings. – maxbachmann Oct 15 '20 at 12:22
Even after preprocessing, `fuzzywuzzy` marks `s_1` and `s_2` as similar – Adhish Thite Oct 15 '20 at 22:09

Specific Approximate Matching in Python

0 Answers0