I have a problem where I need to do a fuzzy lookup in a hash map, i.e. return the value corresponding to that key that most closely resembles the query, in my case measured by the Levenshtein distance.
My current approach is to subclass dict
with a special lookup method that computes the Levenshtein distance against all keys, then returns the value of the key with the lowest score. Basically this:
import Levenshtein
class FuzzyLookupDict(dict):
def fuzzy_lookup(self, query):
levs = [(key, Levenshtein.ratio(query, key)) for key in self.keys()]
key, score = max(levs, key=lambda lev: lev[1])
return self.get(key)
Is this a good approach or is there a better solution that I haven't thought of?