Get more similar word based on a Pandas DataFrame and a List

Question

I want to return a new column on my dataframe with the word more similar to my pandas column value (in this case col1). My actual dataframe is:

And I have the following list:

['Product_A1', 'Product_B1', 'Product_C']

And my output should be:

For that I am using the following code (i'm just printing the results):

import pandas as pd
import difflib
d = {'col1': ['Product_Z1', 'Product_A', 'Product_B'], 'col2': [1, 2, 3]}
df = pd.DataFrame(data=d)
products_list = ['Product_A', 'Product_B', 'Product_C']
print(difflib.get_close_matches(df['col1'], products_list))

However I'm getting always a empty list...

What I am doing wrong?

Thanks!

If there is a certain pattern, you could compare a subset of the strings or use a regex. — jimfawkes, Nov 22 '19 at 20:29
Answer below by Hugo Salvador has embedded in it the answer to your "what am I doing wrong" question, but just for clarification: the first arg to `get_close_matches` needs to be a string, not a list of strings. — RishiG, Nov 22 '19 at 20:37

Hugo Salvador · Accepted Answer · 2019-11-25T15:50:01.873

2

Try this:

df['col3'] = df['col1'].apply(lambda x : difflib.get_close_matches(x, products_list, cutoff=0.9))

The issue about your solution is that the get_close_matches function is looking for the whole column in the products_list. Add the lines bellow and see:

In [8]: products_list2 = [['Product_Z1', 'Product_A', 'Product_B'], ['test']]

In [9]: difflib.get_close_matches(df['col1'], products_list2)
Out[9]: [['Product_Z1', 'Product_A', 'Product_B']]

edited Nov 25 '19 at 15:50

answered Nov 22 '19 at 20:35

Hugo Salvador

1,094
1
11
11

1

Good answer as far as code goes. Could be slightly improved by explicitly answering the question "what went wrong?" and including a link to the `difflib.get_close_matches` documentation. – RishiG Nov 22 '19 at 20:39

Get more similar word based on a Pandas DataFrame and a List

1 Answers1