-2

Hi I am trying to generate n and n-1 grams and to compute the probabilities of the ngrams. However, the n-1 grams generated is not taking the last element of each sublist. Can somebody help me figure out where I am going wrong.

Input:
input1 = [['A', 'B', 'C', 'D', 'E'],
          ['D', 'E', 'C', 'D', 'E'],
          ['A', 'C', 'D', 'D']]

for line in input_text:
    for i in range (len(line)-n+1):

        g = ' '.join(line[i:i+n])
        ngram.setdefault(g, 0)
        ngram[g] += 1
        h = ' '.join(line[i:i+n-1])
        history.setdefault(h, 0)
        history[h] +=1

The output of the n-1 grams i.e. history is as follow: {'D': 4, 'A': 2, 'C': 3, 'B': 1, 'E': 1}

However, it should be {'D': 4, 'A': 2, 'C': 3, 'B': 1, 'E': 3}

Can someone help me to debug this. Thanks

user3320097
  • 11
  • 1
  • 7

1 Answers1

0

It's not completely clear what you're trying to do. Are you trying to create a dictionary that maps all n-grams of order 'n' to their frequencies, where n can be set to be equal to 1 for unigrams, 2 for bigrams, etc? If so, all you need is:

input1 = [['A', 'B', 'C', 'D', 'E'],
      ['D', 'E', 'C', 'D', 'E'],
      ['A', 'C', 'D', 'D']]

n = 1
ngram = {}

for line in input1:
    for i in range (len(line)-n+1):
        g = ' '.join(line[i:i+n])
        ngram.setdefault(g, 0)
        ngram[g] += 1

This gives {'A': 2, 'B': 1, 'C': 3, 'D': 5, 'E': 3} when n = 1, {'A B': 1, 'B C': 1, 'C D': 3, 'D E': 3, 'E C': 1, 'A C': 1, 'D D': 1} if n is changed to 2, etc.

Gabriel
  • 587
  • 5
  • 17