I'm trying to discern the string similarity between two strings (using Jaro). Each string resides in a separate column in my dataframe.
String 1 = df['name_one']
String 2 = df['name_two']
When I try to run my string similarity logic:
from pyjarowinkler import distance
df['distance'] = df.apply(lambda d: distance.get_jaro_distance(str(d['name_one']),str(d['name_two']),winkler=True,scaling=0.1), axis=1)
I get the following error:
**error: JaroDistanceException: Cannot calculate distance from NoneType (str, str)**
Great, so there is a nonetype in the columns, so the first thing I do is check for this:
maskone = df['name_one'] == None
df[maskone]
masktwo = df['name_two'] == None
df[masktwo]
This yields in no None types found.... I'm scratching my head here at this point, but proceed to clean the two columns any ways.
df['name_one'] = df['name_one'].fillna('').astype(str)
df['name_two'] = df['name_two'].fillna('').astype(str)
And yet, I'm still getting:
error: JaroDistanceException: Cannot calculate distance from NoneType (str, str)
Am I removing NoneTypes correctly?