1

PROBLEM

I want to implement a type of specific approximate matching of two sentences in Python.

Example -

s_1 = "I hope you are safe from COVID-19 today"
s_2 = "I hope you're safe from COVID 19 today"

score = get_similarity(s_1, s_2)

OR

s_1 = "I allow account access to facebook"
s_2 = "I allow an account access to face book"

score = get_similarity(s_1, s_2)


APPROACH

I tried using FuzzyWuzzy to get a partial ratio of matching, but I have observed that with that, even if s_2 is "I allow an account access to, without the face book, it will give a high similarity score.



ASK

Is there a better way so that I can take into account a similarity of the entire sentence into consideration?


NOTE - s_2 might or might not be a transcription from a from a video file so will have to account for that delta in getting a precise text. Example, FACEBOOK can be transcribed as FACE BOOK

Adhish Thite
  • 463
  • 2
  • 5
  • 20
  • in fuzzywuzzy there is fuzz.ratio which will calculate a distance based on the whole sentence. Probably it makes sence to do some preprocessing like e.g. to lowercase both strings. – maxbachmann Oct 15 '20 at 12:22
  • Even after preprocessing, `fuzzywuzzy` marks `s_1` and `s_2` as similar – Adhish Thite Oct 15 '20 at 22:09

0 Answers0