I want to calculate the Levenshtein distance for two strings in two columns in a dataframe. The dataframe looks like this (this is only a part of the data frame, it has approximately 4000 rows).
I want to use the Levenshtein method to get the Levenshtein distance for the strings in column "Source1" compared to "Source2"
Here is my code so far:
#import packages
import pandas as pd
import pyodbc
import Levenshtein as lev
import numpy as np
#Read excel file
df = pd.read_excel(xxx)
df.head(10)
#define arrays
a = df.Source1.to_numpy()
b = df.Source2.to_numpy()
#calculate Levenshtein distance between two arrays
for i,k in zip(a, b):
print(lev(i, k))
I get the following error:
TypeError Traceback (most recent call last) Input In [79], in <cell line: 6>() 5 #calculate Levenshtein distance between two arrays 6 for i,k in zip(a, b): 7 # print(type(i), type(k)) ----> 8 print(lev(i, k))
TypeError: 'module' object is not callable
Can anyone please advise?
Jaay helped me in the comments. The solution is to use print(lev.distance(i, k))