Suppose I have a list of sports like this :
sports=["futball","fitbal","football","tennis","tenis","tenisse","footbal","zennis","ping-pong"]
I would like to create a dataframe that match each element of sport with it's closest if the fuzzy matching is superior than 0.5 and if it's not just match it with itself. (I want to use the function fuzzywuzzy.fuzz.ratio(x,y) for that)
The result should look like :
pd.DataFrame({"sport":sports,"closest_match":["futball","futball","football","tennis","tennis","tennis","futball","tennis","ping-pong"]})
sport closest_match
0 futball futball
1 fitbal futball
2 football football
3 tennis tennis
4 tenis tennis
5 tenisse tennis
6 footbal futball
7 zennis tennis
8 ping-pong ping-pong
Thanks