I'm trying to implement a name duplications for one of our use case.
Here I have a set of 10 names along with their index column as below.
Here I would like to calculate fuzzy metrics(Levenshtein,JaroWinkler) per each of name combinations using a rapidxfuzz module as below.
from rapidfuzz import fuzz
from rapidfuzz.distance import Levenshtein,JaroWinkler
round(Levenshtein.normalized_similarity(name_0,name_1),5)
round(JaroWinkler.similarity(name_0,name_1),5)
For example: idx-0 name Mallesham Yamulla to be paired with names having indexes sequence (1,9) names[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9)] and calculate their levenshtein and Jarowrinkler similar percentages.
Next idx-1 name with names index sequence (2,9), idx-2 with name index sequence (3,9), idx-3 with (4,9) so on so forth till (8,9)
The expected output would be :