Levenshtein cost settings

Question

I've been asked to guess the user intention when part of expected data is missing. For example if I'm looking to get very well or not very well but I get only not instead, then I should flag it as not very well.

The Levenshtein distance for not and very well is 9 and the distance for not and not very well is 10. I think I'm actually trying to drive a screw with a wrench, but we have already agreed in our team to use Levenshtein for this case.

As you have seen the problem above, is there anyway if I can make some sense out of it by changing the insertion, replacement and deletion costs?

P.S. I'm not looking for a hack for this particular example. I want something that generally works as expected and outputs a better result in these cases also.

Soundex might be a better algorithm: https://en.wikipedia.org/wiki/Soundex. Both "not" and "cup" have the same levelshtein distance. IMO, "if (str.match(/^\s*[nN])) {str='not very well'} else {str='very well'}" is simpler. — glenn jackman, Feb 12 '14 at 14:56
@glennjackman I'm 100% agree with you. That's what I've offered, but the argument was it might not work as expected with other languages rather than English. Thanks anyways, I will bring it up again with our team. — Mahdi, Feb 12 '14 at 15:34

score 0 · Answer 1 · answered Apr 27 '14 at 03:22

The Levenshtein distance for not and very well is actually 12. The alignment is:

------not
very well

So there are 6 insertions with a total cost of 6 (cost 1 for each insertion), and 3 replacements with a total cost of 6 (cost 2 for each replacement). The total cost is 12.

The Levenshtein distance for not and not very well is 10. The alignment is:

not----------
not very well

This includes only 10 insertions. So you can choose not very well as the best match.

The cost and alignment can be computed with htql for python:

import htql
a=htql.Align()
a.align('not', 'very well')
# (12.0, ['------not', 'very well'])
a.align('not', 'not very well')
# (10.0, ['not----------', 'not very well'])

Levenshtein cost settings

1 Answers1