1

I have a 3d numpy array A representing trigram language model. So A[i, i-1, i-2] is the probability $P(w_i|w_{i-1},w_{i-2})$, where $w$ are consecutive words. I want to extract all probabilities for a sequence of words. Now I am using the following:

probas = [] 
for i in range(2, len(words)):
    probas.append(A[words[i-2], words[i-2], words[i]])

The words are indices of words in some vocabulary. My question is, can this be efficiently vectorized with numpy indexing? The word list can be really long and looping in this way could be prohibitively expensive as it will be done for many sequences.

JAV
  • 279
  • 2
  • 9

0 Answers0