1

I am following the tutorial here: https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-language-model-nlp-python-code/#h2_5 to create a Language model. I am following the bit about the N-gram Language model.

This is the completed code:

from nltk.corpus import reuters
from nltk import bigrams, trigrams
from collections import Counter, defaultdict

# Create a placeholder for model
model = defaultdict(lambda: defaultdict(lambda: 0))

# Count frequency of co-occurance
for sentence in reuters.sents():
    for w1, w2, w3 in trigrams(sentence, pad_right=True, pad_left=True):
        model[(w1, w2)][w3] += 1

# Let's transform the counts to probabilities
for w1_w2 in model:
    total_count = float(sum(model[w1_w2].values()))
    for w3 in model[w1_w2]:
        model[w1_w2][w3] /= total_count

input = input("Hi there! Please enter an incomplete sentence and I can help you\
 finish it!\n").lower().split()

print(model[tuple(input)])

To get output from the model, the website does this: print(dict(model["the", "price"])) but I want to generate output from a user inputted sentence. When I write print(model[tuple(input)]), it gives me an empty defaultdict.

Disregard this (keeping for history):

How do I give it the list I create from the input? model is a dictionary and I've read that using a list as a key isn't a good idea but that's exactly what they're doing? And I'm assuming mine doesn't work because I'm listing a list? Would I have to iterate through the words to get results?

As a side note, is this model considering the sentence as a whole to predict the next word, or just the last word?

Helana Brock
  • 45
  • 15
  • 1
    1. the dictionary is using tuples, not lists (see `model[(w1, w2)][w3]...`. 2. From the call to `trigrams` I can only conclude that it uses trigrams, i.e.: computes the probability of a word given the occurrence of the 2 previous ones. – ichramm Oct 08 '21 at 01:22
  • @JuanR I totally missed that it was using tuples! And yes, trigram would suggest that as well. Thank you for pointing all that out!! – Helana Brock Oct 08 '21 at 01:24

1 Answers1

1

I had to give the model the last two words from the list not the entire thing, even if it's two words. Like so:

model[tuple(input[-2:])]
Helana Brock
  • 45
  • 15