2

I want to find the fuzz.ratio of strings that are in two dataframes. Let's say I have 2 dataframes df with columns A, B and bt_df with columns A1, B1.. I want to compare the column df['B'] and bt_df['B1'] and return the best matching score and its corresponding id in df[A] and .

df
Out[8]: 
                  A            B
0  11111111111111111111  Cheesesalad
1  22222222222222222222       Cheese
2  33333333333333333333        salad
3  44444444444444444444     BMWSalad
4  55555555555555555555          BMW
5  66666666666666666666        Apple
6  77777777777777777777    Apple####
7  88888888888888888888    Macrooni!

bt_df
Out[9]: 
    A1        B1
0   180336       NaN
1   154263    Cheese
2   130876     Salad
3   204430  Macrooni
4   153490       NaN
5    48879       NaN
6   185495       NaN
7   105099       NaN
8     8645     Apple
9    54038       NaN
10  156523       NaN
11   18156       BWM

Hence the result should be:
B1            matchedstring   score   id
Cheese       Cheese           100     22222222222222222222
.....
.....

Thanks in advance.

User1090
  • 859
  • 6
  • 13
  • 19
  • What have you tried doing so far? Did you have a look at [`difflib`](https://docs.python.org/2/library/difflib.html) library? – Nickil Maveli Aug 25 '16 at 19:08

0 Answers0