I know this question has been asked in some way so apologies. I'm trying to fuzzy match list 1(sample_name) to list 2 (actual_name). Actual_name has significantly more names than list 1 and I keep runninng into fuzzy match not working well. I've tried the multiple fuzzy match methods(partial, set_token) but keep running into issues since there are many more names in list 2 that are very similar. Is there any way to improve matching here. Ideally want to have list 1, matched name from list 2, with the match score in column 3 in a new dataframe. Any help would be much appreciated. Thanks.
Have used this so far:
df1=sample_df['sample_name'].to_list()
df2=actual_df['actual_name'].to_list()
response = {}
for name_to_find in df1:
for name_master in df2:
if fuzz.partial_ratio(name_to_find,name_master) > 90:
response[name_to_find] = name_master
break
for key, value in response.item():
print('sample name' + key + 'actual_name' + value)
sample_name | actual_name |
---|---|
jtsports | JT Sports LLC |
tombaseball | Tom Baseball Inc. |
context express | Context Express LLC |
zb sicily | ZB Sicily LLC |
lightening express | Lightening Express LLC |
fire roads | Fire Road Express |
N/A | Earth Treks |
N/A | TS Sports LLC |
N/A | MM Baseball Inc. |
N/A | Contact Express LLC |
N/A | AB Sicily LLC |
N/A | Lightening Roads LLC |