I have a dataframe with 'Name' column. There are multiple similar entryies with some inconsistencies. I want to merge them to one. I am a starter in data analysis and came to know about fuzzywuzzy module. I tried in below way
names = list(data['Name'].unique())
def replace_matches(df, column, matching_string, min_ratio = 90):
strings = df[column].unique()
for i in matching_string:
matches = fuzzywuzzy.process.extract(i, strings, limit= 5, scorer=fuzzywuzzy.fuzz.token_sort_ratio)
close_matches = [matches[0] for matches in matches if matches[1] >= min_ratio]
matched_rows = df[column].isin(close_matches)
df.loc[matched_rows, column] = matching_string
return df
I am calling the function below:
replace_matches(df = data, column = 'Name', matching_string = names)
but it is giving ValueError: Must have equal len keys and value when setting with an iterable.
What is wrong here? is there any other efficient way to check all the similar kind of entry in a column?