0

I tried to use difflib to compare words and sentences (in this case something like dictionary) and when I try to compare difflib output with keys in dictionary I get KeyError. Can anyone explain to me why this happens? When I'm not using difflib everything works fine.

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import difflib
import operator

lst = ['król']
word = 'król'

dct = {}
for order in lst:
    word_match_ratio = difflib.SequenceMatcher(None, word, order).ratio()

    dct[order] = word_match_ratio
    print order
    print('%s %s' % (order, word_match_ratio))


sorted_matching_words = sorted(dct.items(), key=operator.itemgetter(1))
sorted_matching_words = str(sorted_matching_words.pop()[:1])
x = len(sorted_matching_words) - 3
word = sorted_matching_words[3:x]

print word


def translate(someword):
    someword = trans_dct[someword]
    print(someword)
    return someword

trans_dct = {
    "król": 'king'
}
print trans_dct
word = translate(word)

Expected output: king

Instead of that I get:

Traceback (most recent call last):
  File "D:/Python/Testing stuff.py", line 64, in <module>
    word = translate(word)
  File "D:/Python/Playground/Testing stuff.py", line 56, in translate
    someword = trans_dct[someword]
KeyError: 'kr\\xf3l'

I don't understand why this happens it looks like difflib is doing something weird because when I do something like this:

uni = 'kr\xf3l'
print uni


def translate(word):
    word = dct1[word]
    print(word)
    return word

dct1 = {
    "król": 'king'
}
print dct1
word = translate('kr\xf3l')
print word

Everything works as intended.

Gunnm
  • 974
  • 3
  • 10
  • 21

1 Answers1

2

The problem is not with the difflib, but with extracting word:

sorted_matching_words = sorted(dct.items(), key=operator.itemgetter(1))
# sorted_matching_words = (u'kr\xf3l',)

sorted_matching_words = str(sorted_matching_words.pop()[:1])
# sorted_matching_words = "(u'kr\\xf3l',)"

x = len(sorted_matching_words) - 3
word = sorted_matching_words[3:x]
# word = 'kr\\xf3l'

You should not convert sorted_matching_words because it is a tuple. Each tuple element converts to the string using a __repr__ method, that is why it escapes \. You should just take the first tuple element:

In [34]: translate(sorted_matching_words[-1][0])
king
Out[34]: u'king'
awesoon
  • 32,469
  • 11
  • 74
  • 99
  • 1
    specifically change `sorted_matching_words = str(sorted_matching_words.pop()[:1])` and the next two lines to just `word = sorted_matching_words.pop()` instead of cutting off the brackets of the tuple. – Tadhg McDonald-Jensen Mar 18 '16 at 17:48