Python: How to get possible combinations of keys in dict

Question

Given a dict of vocabulary: {'A': 3, 'B': 4, 'C': 5, 'AB':6} and a sentence, which should be segmented: ABCAB.

I need to create all possible combinations of this sentence such as [['A', 'B', 'C', 'A', 'B'], ['A', 'B', 'C', 'AB'], ['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B']]

That's what I have:

def find_words(sentence):   
    for i in range(len(sentence)):

        for word_length in range(1, max_word_length + 1):

            word = sentence[i:i+word_length]
            print(word)

            if word not in test_dict:
                continue

            if i + word_length <= len(sentence):
                if word.startswith(sentence[0]) and word not in words and word not in ''.join(words):
                    words.append(word)
                else:
                    continue

                next_position = i + word_length

                if next_position >= len(sentence):
                    continue
                else:
                    find_ngrams(sentence[next_position:])

    return words

But it returns me only one list.

I was also looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

I think I would take this under two phases. 1: try to complete given sentence with the smallest elements in your tool. 2: try to merge elements in your solution into bigger tools. — Ali Yılmaz, May 12 '18 at 11:30
@ninesalt yes, it could be also spaces or something like that — muc777, May 12 '18 at 11:32
Please fix your example list with commas instead of vertical lines, and add normal style `'`. — Bram Vanroy, May 12 '18 at 11:48

score 3 · Accepted Answer · answered May 12 '18 at 12:01

3

Try all possible prefixes and recursively do the same for the rest of the sentence.

VOC = {'A', 'B', 'C', 'AB'}  # could be a dict

def parse(snt):
    if snt == '': 
        yield []
    for w in VOC:
        if snt.startswith(w):
            for rest in parse(snt[len(w):]):
                yield [w] + rest

print(list(parse('ABCAB')))

# [['AB', 'C', 'AB'], ['AB', 'C', 'A', 'B'],
# ['A', 'B', 'C', 'AB'], ['A', 'B', 'C', 'A', 'B']]

answered May 12 '18 at 12:01

VPfB

14,927
6
41
75

I have a new problem: I have a huge dict (6MB) and thousands of sentences (30MB) which should be parsed. Do you think this method will need long time to process? Cuz I waited for over 8h yesterday and it was still not finished. @VPfB – muc777 May 13 '18 at 07:20
@Y.River Some optimisation is surely possible, but otherwise I don't know about a different and more efficient apporach. I'd like to suggest to measure the time for few average sentences. Then you could estimate the time needed to process thousands of sentences. You could also add some kind of counter to monitor the progress. – VPfB May 13 '18 at 12:47
Yep, i will try it. Thank u! – muc777 May 13 '18 at 20:31

score 0 · Answer 2 · answered May 12 '18 at 11:53

0

Although not the most efficient solution, this should work:

from itertools import product

dic = {'A': 3, 'B': 4, 'C': 5, 'AB': 6}
choices = list(dic.keys())
prod = []

for a in range(1, len(choices)+2):
    prod = prod + list(product(choices, repeat=a))

result = list(filter(lambda x: ''.join(x) == ''.join(choices), prod))
print(result) 

# prints [('AB', 'C', 'AB'), ('A', 'B', 'C', 'AB'), ('AB', 'C', 'A', 'B'), ('A', 'B', 'C', 'A', 'B')]

answered May 12 '18 at 11:53

ninesalt

4,054
5
35
75

Thank you, but what I need is to segment a sentence. The dictionary would be quite huge and offers only different words that only part of them will be used. @ninesalt – muc777 May 13 '18 at 10:25

score -1 · Answer 3 · answered May 12 '18 at 11:43

Use itertools permutations to give all unique combinations.

d ={'A': 3, 'B': 4, 'C': 5, 'AB':6}

l = [k for k, v in d.items()]

print(list(itertools.permutations(l)))

[('A', 'B', 'C', 'AB'), ('A', 'B', 'AB', 'C'), ('A', 'C', 'B', 'AB'), ('A', 'C', 'AB', 'B'), ('A', 'AB', 'B', 'C'), ('A', 'AB', 'C', 'B'), ('B', 'A', 'C', 'AB'), ('B', 'A', 'AB', 'C'), ('B', 'C', 'A', 'AB'), ('B', 'C', 'AB', 'A'), ('B', 'AB', 'A', 'C'), ('B', 'AB', 'C', 'A'), ('C', 'A', 'B', 'AB'), ('C', 'A', 'AB', 'B'), ('C', 'B', 'A', 'AB'), ('C', 'B', 'AB', 'A'), ('C', 'AB', 'A', 'B'), ('C', 'AB', 'B', 'A'), ('AB', 'A', 'B', 'C'), ('AB', 'A', 'C', 'B'), ('AB', 'B', 'A', 'C'), ('AB', 'B', 'C', 'A'), ('AB', 'C', 'A', 'B'), ('AB', 'C', 'B', 'A')]

Do you get the permutation `['A'|'B'|'C‘|‘A‘|‘B‘]` — Sruthi, May 12 '18 at 11:44

Python: How to get possible combinations of keys in dict

3 Answers3