Convert ngrams into a frequency dictionary in Python

Question

Can anybody help with a function to convert the following ngram into the result below? The return should concatenate the first N-1 elements of the ngram and count how often the different successors (Nth element) occur. I was thinking of some nested for loops, but I am struggling to build a structure. Thanks a lot!!

ngrams = [['will', 'leave', 'florida'], ['will', 'leave', 'nyc'], ['will', 'leave', 'florida'],['wont', 'leave', 'florida']]

The return should be:

{'will leave': {'florida': 2, 'nyc': 1}, 'wont leave': {'florida': 1}}

What is the issue, exactly? Have you tried anything, done any research? — AMC, Feb 04 '20 at 05:15

score 0 · Answer 1 · answered Feb 03 '20 at 21:19

Here is one approach

ngrams = [['will', 'leave', 'florida'], ['will', 'leave', 'nyc'], ['will', 'leave', 'florida'],['wont', 'leave', 'florida']]

dct = {'will leave': {}, 'wont leave': {}}

for i in ngrams:
    a, b, c = i
    if c in dct[a + ' ' + b]:
        dct[a+' '+b][c] += 1
    else:
        dct[a+' '+b].update({c: 1})

print(dct)

{'will leave': {'florida': 2, 'nyc': 1}, 'wont leave': {'florida': 1}}

Convert ngrams into a frequency dictionary in Python

1 Answers1