1

For example, A dictionary is provided as nba_dict = {'lakers':100,'Warriors':99}. The keys of the dict are manually input by users. And we can see there exist a typo diff. Upper case for 'Warriors' and lower case for 'lakers'. The goal is to get the value by the keys. But I want to have a function to handle these typo errors, For example, if I do nba_dict['Aker'] and it will provide me 100.

  • You could easily make a function to automatically capitalize the first letter of whatever string the user inputs. To detect typos is more difficult, you might start by simply checking if the given string is contained within any keys in the existing dictionary. – Richard McCormick Aug 23 '22 at 00:17
  • So what you want is beyond just capitalization. It is a fuzzy match. See if [this helps](https://stackoverflow.com/a/48017010/758174). – Pierre D Aug 23 '22 at 00:25
  • Not matching the post's origin expectation, but I may suggest using input validation for the keys if it's feasible. – zhugen Aug 23 '22 at 01:03

1 Answers1

1

You can use Levenshtein distance to calculate the minimum number of "single edits" to change the first string into the second one. There is a convenient implementation in C with Python interface:

pip install python-Levenshtein

Then you can compare the distance between your query and all the keys, find the minimum one and return the corresponding value. You can also return an error if the minimal distance is above an empirical threshold to filter "too bad" requests.

from Levenshtein import distance as lev_dist


class FlexibleDict(dict):

    def __init__(self, _threshold=3, **kwargs):  # change the threshold
        super().__init__(**kwargs)
        self.threshold = _threshold

    def __getitem__(self, query):
        if query in self:
            return super().__getitem__(query)
        keys = list(self.keys())
        min_dist, best_key = lev_dist(query, keys[0]), keys[0]
        for key in keys[1:]:
            dist = lev_dist(query, key)
            if dist < min_dist:
                min_dist, best_key = dist, key
        if min_dist <= self.threshold:
            return super().__getitem__(best_key)
        raise ValueError(f"key '{query}' not found")

nba_dict = FlexibleDict(lakers=100, warriors=90)
# to change the threshold
# nba_dict = FlexibleDict(_threshold=2, lakers=100, warriors=90)


print(nba_dict['Aker'])  # 100
print(nba_dict['gege'])  # error

Except for initialization, FlexibleDict objects can be manipulated exactly as vanilla dict objects.

Valentin Goldité
  • 1,040
  • 4
  • 13