I want to make a matrix of the bigram model. How can I do it? Any suggestions which match my code, please?
import nltk
from collections import Counter
import codecs
with codecs.open("Pezeshki339.txt",'r','utf8') as file:
for line in file:
token=line.split()
spl = 80*len(token)/100
train = token[:int(spl)]
test = token[int(spl):]
print(len(test))
print(len(train))
cn=Counter(train)
known_words=([word for word,v in cn.items() if v>1])# removes the rare words and puts them in a list
bigram=nltk.bigrams(known_words)
frequency=nltk.FreqDist(bigram)
for f in frequency:
print(f,frequency[f])
I need something like:
w1 w2 w3 ....wn
w1 n(w1w1) n(w1w2) n(w1w3) n(w1wn)
w2 n(w2w1) n(w2w1) n(w2w3) n(w2wn)
w3 .
.
.
.
wn
The same for all rows and columns.