I have a function that calculate fuzzywuzzy
score for two texts:
def fuzzywuzzy(text_1, text_2):
scores = {
'ratio' : fuzz.ratio(tn.normalize_title(text_1),tn.normalize_title(text_2)) / 100,
'partial_ratio' : fuzz.partial_ratio(tn.normalize_title(text_1),tn.normalize_title(text_2)) / 100,
'token_sort_ratio' : fuzz.token_sort_ratio(tn.normalize_title(text_1),tn.normalize_title(text_2)) / 100,
'token_set_ratio' : fuzz.token_set_ratio(tn.normalize_title(text_1),tn.normalize_title(text_2)) / 100}
return scores
As can be seen from the above code, I normalize text 1 and 2 before calculating the scores.
The fuzzywuzzy
function is called here:
event['scores'] = scores(v_ data['text1'], event['_source']['event_record']['text2'])
I need to modify the query and say if the value of fuzzy score of token_set_ratio
is greater than 0.99, then return the scores. I am applying this code on 2000+ records.
Please save me with your ideas.