You could use fuzzywuzzy
to find the similarity of the values. Using for example fuzz.ratio(str1, str2)
will return you an indicator how similar the strings are. There are other methods as well. You would have to find out yourself what suits best for your use case.
The example below uses fuzz.ratio(str1, str2)
and considers a ratio of 80 to be equals:
# pip install fuzzywuzzy python-levenshtein
# or: conda install -c conda-forge fuzzywuzzy python-levenshtein
import io
import pandas as pd
from fuzzywuzzy import fuzz
df1 = pd.read_csv(io.StringIO("""
Product Name, Cost
Car with batteries, 2
Headphones Sony, 3
"""))
df2 = pd.read_csv(io.StringIO("""
Product Name, Cost
Car batteries, 2
Headphones Sony, 3
"""))
COLUMN_NAME = "Product Name"
ACCEPTED_RATIO = 80
def match(right, left):
return fuzz.ratio(right, left) > ACCEPTED_RATIO
rsuffix = "_r"
compared = df1.join(df2, rsuffix=rsuffix)
compared["Matches"] = compared.apply(
lambda x: match(x[COLUMN_NAME], x[f"{COLUMN_NAME}{rsuffix}"]),
axis=1,
)
compared = compared.drop(
[c for c in compared.columns if c.endswith(rsuffix)],
axis=1
)
And the output of print(compared)
would be:
Product Name Cost Matches
0 Car with batteries 2 True
1 Headphones Sony 3 True