0

suppose I have list of unique 300k+ items:

mylist = ["door", "mango", "rose", "orange", "car", "knowledge", "flower", ...., 300k+ items]

userinput = input()

Now, if the user inputs jumbled word for "knowledge". eg. "dngwekleo", the program should check the input word in mylist and print "knowledge" as output.

My code works fine till the length of the input word is 7, I have used permutations code for input and then match each word in permutation == mylist. But as soon as input length of input word goes beyond 8-10, it creates too many permutations and then python takes too much time (10 mins, 20 mins, 30 mins) in getting the output.

Please help me in solving this to get answer quicker like 10-15 secs, trying since 20 days.

DYZ
  • 55,249
  • 10
  • 64
  • 93
sam
  • 65
  • 5

3 Answers3

2

Just to kick-start, you can approach by creating a lookup with key sorted by character's & retain value with original string.
eg: {deegklnow : knowledge}

my_list = ["door", "mango", "rose", "orange", "car", "knowledge", "flower"]

lookup = {"".join(sorted(x)): x for x in my_list}

print(lookup.get("".join(sorted("dngwekleo"))))
print(lookup.get("".join(sorted("eosr"))))
print(lookup.get("".join(sorted("rca"))))

knowledge
rose
car
sushanth
  • 8,275
  • 3
  • 17
  • 28
1

You can count the letters in each word in the original list and in the input. If the counts match, one word is a permutation of the other.

from collections import Counter
# Pre-calculate the dictionaries
counts = [Counter(word) for word in mylist]

userinput = input()
count = Counter(userinput)
if count in counts:
    # Found it!

For large lists, you may be able to reduce lookup time by calculating a set of frozen sets of letter-count pairs for each word:

counts = {frozenset(Counter(word).items()) for word in mylist}
count = frozenset(Counter(userinput).items())
if count in counts: ...
DYZ
  • 55,249
  • 10
  • 64
  • 93
0

edit after thinking for a bit I think DYZ's answer might be faster.

Note: I'm assuming that it is acceptable to do some pre-computing on the set of input words, and that only the lookup time after that really matter.

To expand on the idea of DYZ:

  • count the occurrence of each letter
  • use that count to update a hash value
  • do this for each input word on the list to get a dict with key: hash, value: word (or list of words, as e.g. "cart" and "trac" would lead to the same character counts)
  • then hash also the user input and do a lookup in the dict

example implementation of a hash function:

import hashlib
import string

def get_char_count_hash(input_string):
    char_count_hash = hashlib.sha256()

    for char in string.ascii_lowercase:
        char_count = input_string.count(char)
        char_count_hash.update(str(char_count))

    return char_count_hash.hexdigest()

Note: you can probably cut down on the pre compute time by optimizing the hash function a bit.

0x6d64
  • 153
  • 2
  • 9