0

I have a list that contains several strings and a dictionary with strings (that contain wildcards) as keys and integers as values.

For example like this:

list1 = ['i', 'like', 'tomatoes']
dict1 = {'tomato*':'3', 'shirt*':'7', 'snowboard*':'1'}

I would like to go through list1 and see if there is a key in dict1 that (with the wildcard) matches the string from list1 and get the respective value from dict1. So in this case 3 for 'tomato*'.

Is there a way to iterate over list1, see if one of the dict1 keys (with wildcards) matches with this particular string and return the value from dict1?

I know I could iterate over dict1 and compare the keys with the elements in list1 this way. But in my case, the dict is very large and in addition, I have a lot of lists to go through. So it would take too much time to loop through the dictionary every time. I thought about turning the keys into a list as well and get wildcard matches with a list comprehension and fnmatch(), but the returned match wouldn't be able to find the value in the dict (because of the wildcard).

trotta
  • 1,232
  • 1
  • 16
  • 23
  • 3
    A dictionary isn't much use here, whichever of those methods you use you'll have an `O(n)` scan of all the keys, at which point a hashmap isn't really helping. Perhaps you could have a canonicalisation function, e.g. that takes `'tomatoes'` and returns the root `'tomato'`, then use *that* as the key? – jonrsharpe Aug 06 '18 at 10:48
  • Can wildcard appear inside word or at the beginning? – awesoon Aug 06 '18 at 10:50
  • @jonrsharpe There are lemmatizers or stemmers (if you refer to that?) out there that I could use at least for English language to break down words (e.g. `'tomatoes'` to `'tomato'`). But since I'm working with different languages this not an option. – trotta Aug 06 '18 at 10:53
  • @soon wildcards only appear at the end of words. – trotta Aug 06 '18 at 10:53
  • 3
    You can try to use trie. If wildcard appears at the end of the word, it is pretty easy to implement. Take a look at this (http://pygtrie.readthedocs.io/en/latest/) but I am not sure if it supports wildcards - you may need to implement them yourself – awesoon Aug 06 '18 at 10:56
  • 3
    Tries are the way to go here, yes. If the only wildcard you have is `*` with the common meaning "anything here" then you don't need to explicitly implement it in the trie, just stop when the prefix is in the trie and put the leaf node there. For lookups, stop at the first leaf node you find (if any) – GPhilo Aug 06 '18 at 10:59
  • Thanks, I'll have a look into trie! – trotta Aug 06 '18 at 11:12
  • Possible duplicate of [Searching strings with . wildcard](https://stackoverflow.com/questions/4869589/searching-strings-with-wildcard) – awesoon Aug 06 '18 at 11:16

1 Answers1

1

Here is a data structure implemented using default python package to help you.

from collections import defaultdict


class Trie(defaultdict):
    def __init__(self, value=None):
        super().__init__(lambda: Trie(value))  # Trie is essentially hash-table within hash-table
        self.__value = value

    def __getitem__(self, key):
        node = self
        if len(key) > 1:  # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
            for char in key:
                node = node[char]
            return node
        else:  # actual getitem routine
            return defaultdict.__getitem__(self, key)

    def __setitem__(self, key, value):
        node = self
        if len(key) > 1:  # allows you to access the trie like this trie["abc"] instead of trie["a"]["b"]["c"]
            for char in key[:-1]:
                node = node[char]
            node[key[-1]] = value
        else:  # actual setitem routine
            if type(value) is int:
                value = Trie(int(value))
            defaultdict.__setitem__(self, key, value)

    def __str__(self):
        return str(self.__value)

d = Trie()
d["ab"] = 3
print(d["abcde"])

3
hyloop
  • 349
  • 1
  • 5
  • 1
    Thanks! Needed a little work around to deal with non-existing words, but it works though! – trotta Aug 06 '18 at 13:24