I have some data containing spelling errors. I'm correcting them and scoring how close the spelling is using the following code:
import pandas as pd
import difflib
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
B = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])}
df_B = pd.DataFrame(B)
# Define the function that corrects the spelling:
def Spelling(ask):
return difflib.get_close_matches(ask, Li_A, n=3, cutoff=0.5)[0]
df_B['Correct one'] = df_B['one'].apply(Spelling)
# Define the function that Scores the spelling:
def Spell_Score(row):
return difflib.SequenceMatcher(None, row['one'], row['Correct one']).ratio()
df_B['Score'] = df_B.apply(Spell_Score, axis=1)
This outputs the correct spelling and the score:
df_B
one two Correct one Score
a potat0 po1ato potato 0.833333
b toma3o 2omato tomato 0.833333
c s5uash squ0sh squash 0.833333
d ap8le 2pple apple 0.800000
e pea7 p3ar pear 0.750000
How can I add columns to give the second and third highest scoring results and their scores please?