How to compute word per token word distance and return the count of 0 distance in a column

Question

I got two descriptions, one in a dataframe and other that is a list of words and I need to compute the levensthein distance of each word in the description against each word in the list and return the count of the result of the levensthein distance that is equal to 0

import pandas as pd


definitions=['very','similarity','seem','scott','hello','names']

# initialize list of lists 
data = [['hello my name is Scott'], ['I went to the mall yesterday'], ['This seems very similar']] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Descriptions']) 

# print dataframe. 
df

Column counting the number of all words in each row that computing the Lev distances against each word in the dictionary returns 0

df['lev_count_0']= Column counting the number of all words in each row that computing the Lev distances against each word in the dictionary returns 0

So for example, the first case will be

edit_distance("hello","very") # This will be equal to 4
edit_distance("hello","similarity") # this will be equal to 9
edit_distance("hello","seem") # This will be equal to 4
edit_distance("hello","scott") # This will be equal to 5
edit_distance("hello","hello")# This will be equal to 0
edit_distance("hello","names") # this will be equal to 5

So for the first row in df['lev_count_0'] the result should be 1, since there is just one 0 comparing all words in the Descriptions against the list of Definitions

Description               | lev_count_0
hello my name is Scott    |      1

what have you tried so far? When does it fail? Are you asking us to write the code for you or fix something in your code? — 0x5050, Aug 24 '19 at 05:26
@0x5050 basically nothing comes to my mind on the how. I am learning to code python, and yes I am asking how to write it since I can't. — ScottUrbina, Aug 24 '19 at 14:15

score 0 · Answer 1 · answered Aug 24 '19 at 18:41

My solution

from nltk import edit_distance
import pandas as pd


data = [['hello my name is Scott'], ['I went to the mall yesterday'], ['This seems very similar']] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Descriptions']) 

dictionary=['Hello', 'my']


def lev_dist(colum):
    count=0
    dataset=list(colum.split(" "))
    for word in dataset : 
        for dic in dictionary:
            result=edit_distance(word,dic)
            if result ==0 :
                count=count+1
    return count




df['count_lev_0'] = df.Descriptions.apply(lev_dist)

How to compute word per token word distance and return the count of 0 distance in a column

1 Answers1