with the answer you gave, you can use pandas apply
, stack
and groupby
functions to accelerate your code. you have input such as:
import pandas as pd
from fuzzywuzzy import fuzz
df = pd.DataFrame({'Merchant details': ['Alpha co','Bravo co'],
'Comments':['electionsss are around',
'vote in eelecttions']})
For the column 'comments', you can create a temporary mutiindex DF containing a word per row by splitting and using stack
function:
df_temp = pd.DataFrame(
{'split_comments':df['Comments'].str.split(' ',expand=True).stack()})
Then you create the column with corrected word (according to your idea), using apply
and the comparision of fuzz.ratio
:
df_temp['corrected_comments'] = df_temp['split_comments'].apply(
lambda wd: 'election' if fuzz.ratio(wd, 'election') > 75 else wd)
Finally, you write back in your column Comments
of df
with the corrected data using groupby
and join
functions:
df['Comments'] = df_temp.reset_index().groupby('level_0').apply(
lambda wd: ' '.join(wd['corrected_comments']))