0

So I have a column which looks like this.

name      col1            col2                      col3 
company1  Banking         Finance                   B&F
company2  Utilities       Utilities                 NaN
company3  Transportation  Pipeline Transportation   Utilities
company4  Consulting      Tech                      Insurance

Is there a way where I can do a fuzzy match between col1 , col2 and col3 and assign a new column with the score. I am using fuzzywuzzy in pandas.

Output should look something like this

name      col1            col2                      col3           Score 
company1  Banking         Finance                   B&F             23 
company2  Utilities       Utilities                 NaN             71
company3  Transportation  Pipeline Transportation   Utilities       54
company4  Consulting      Tech                      Insurance        2

(I just put in random values for the score so they are not accurate ) I couldn't find a question like this here , so if it exists please do let me know.

Thankyou

  • Could anyone suggest how do i fuzzy match between 2 columns only? I am looking to compare across columns only on the same rows – Mehul Gupta Jul 23 '18 at 20:44

2 Answers2

1

Use

df['score_1_2'] = df[['col1', 'col2']].apply(lambda row: fuzz.ratio(row['col1'], row['col2']), axis=1) 

if you want to compute the score for columns 1 and 2. You could calculate the mean of all column pairs 1-2, 2-3, 1-3 if that is meaningful to you. It depends on what you are trying to accomplish...

Viktor
  • 396
  • 1
  • 11
0

I don't know if your use case makes sense for fuzzywuzzy ratio functions, all the examples I have seen generate similarity scores using two strings, not three (I haven't used it myself).

But assuming it does make sense, just assign the score to a new column in your data frame, here is some pseudocode (your dataframe called df here):

df['score'] = your_fuzzy_function(df['col1'], df['col2'], df['col3'])

smj
  • 1,264
  • 1
  • 7
  • 14
  • how would i use fuzzywuzzy for comparing 2 columns? I am looking to compare across columns only on the same rows – Mehul Gupta Jul 23 '18 at 20:16