3

Hello I have a specific string and I am trying to calculate its distance using edit distance and I want to see the number of counts of the string that occurs and then sort it.

str= "Hello"

and a txt file named- xfile I am comparing with is:

"hola"
"how are you"
"what is up"
"everything good?"
"hola"
"everything good?"
"what is up?"
"okay"
"not cool"
"not cool"

I want to make a dictionary that compares all the lines with the xfile and give it's edit distance and count. For now, I am able to get it's key and distance, but not it's count. Can someone please suggest me it?

My code is:

data= "Hello"

Utterences = {}

for lines in readFile:
    dist= editdistance.eval(data,lines)
    Utterances[lines]= dist
girlwhocodes
  • 188
  • 6
  • 2
    If the size of the xlist is small you can convert it to dictionary with associated counts via Counter https://docs.python.org/2/library/collections.html#collections.Counter – tRosenflanz Aug 01 '18 at 17:12
  • Is `edit-distance` an external library? – Gigaflop Aug 01 '18 at 17:12
  • yes! it's imported via line-> import editdistance – girlwhocodes Aug 01 '18 at 17:22
  • I gave an example of the xlist. The actual list has more than 2000 instances – girlwhocodes Aug 01 '18 at 17:23
  • Are you comparing `str="Hello"` to every line in xfile, getting the edit distance of those comparisons? – Nick Aug 01 '18 at 17:46
  • @NickPredey yes! that's exactly what I am doing! but for now i have all the editdistances for the xfile but I want the countof it too. For example, {"everything good?", 14, 2} where 14 is the edit distance and 2 is the count – girlwhocodes Aug 01 '18 at 17:49

1 Answers1

4

For every utterance you can have a dictionary containing the distance and count:

import editdistance

data = 'Hello'

utterances = {}

xlist = [
    'hola',
    'how are you',
    'what is up',
    'everything good?',
    'hola',
    'everything good?',
    'what is up?',
    'okay',
    'not cool',
    'not cool',
]

for line in xlist:
    if line not in utterances:
        utterances[line] = {
            'distance': editdistance.eval(data, line),
            'count': 1
        }
    else:
        utterances[line]['count'] += 1

Then if you need the utterances sorted by distance or count you can use an OrderedDict:

from collections import OrderedDict

sorted_by_distance = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['distance']))
sorted_by_count = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['count']))
marcossf
  • 151
  • 1