Frequency Distribution of Bigrams

Question

I have done the following

import nltk


words = nltk.corpus.brown.words()
freq = nltk.FreqDist(words)

And am able to find the frequency of certain words in the brown corpus, like

freq["the"]
62713

But now I want to be able to find the Frequency Distribution of specific bigrams. So then I tried

bigrams = nltk.bigrams(words)
freqbig = nltk.FreqDist(bigrams)

But every bigram that I enter, I always get 0. Like,

freqbig["the man"]
0

What I am doing wrong?

nikeros · Accepted Answer · 2021-12-13T17:51:40.867

1

It accepts a tuple as key, not a str:

freqbig[("the", "man")]

OUTPUT

If you want to pass strings, you could create an auxiliary function which takes care of it:

def get_frequency(my_string):
    return freqbig[tuple(my_string.split(" "))]

edited Dec 13 '21 at 17:51

answered Dec 13 '21 at 16:27

nikeros

1 Answers1