How to create a transition matrix from dictionary with transitions and counts

Question

I am trying to create an HMM and I want to create my transition matrix but i'm not sure how. I have a dictionary with transitions and the probability of those transitions occuring which looks as follows (only larger):

{(1, 2): 0.0035842293906810036, (2, 3): 0.0035842293906810036, (3, 4): 0.0035842293906810036, (4, 5): 0.0035842293906810036, (5, 6): 0.0035842293906810036, (6, 7): 0.0035842293906810036, (7, 8)}

which I defined as follows:

# create a list of bigrams
bigrams = []
for i in range(len(integer_list)):
    if i+1 in range(len(integer_list)):
        bigrams.append((integer_list[i], integer_list[i+1]))

# Create a dictionary containing the counts each bigram occurs in the dataset
bigrams_dict = Counter(bigrams)
values = bigrams_dict.values()

# create a dictionary containing the probability of a word occurring. <- initial probs
frequencies = {key:float(value)/sum(counts_dict.values()) for (key,value) in counts_dict.items()}
frequency_list = []
for value in frequencies.values():
    frequency_list.append(value)

Now I would like to make a transition matrix out of this, which would be a multi dimensional array, but I'm not sure how to do this. Could anyone please help me.

An example of what the transition matrix would look something like this (only with more states of course):


   0   1/3  2/3
   0   2/3  1/3
   1    0    0

Can you give an example of such a matrix? I thought I knew what you wanted, but then you said "multi dimensional array", so I'm not confident that my thought is correct. — shadowtalker, Jun 11 '20 at 12:10
Yeah, so basically what I want is something like this: ``` #array([[ 0. , 0.33333333, 0.66666667], # [ 0. , 0.66666667, 0.33333333], # [ 1. , 0. , 0. ]]) ``` Where for example coordinate (0,1) in the matrix would be the probability acquired from the values of the key (0,1) in the dictionary. Hopefully that makes some sense — WibeMan, Jun 11 '20 at 12:25
I edited my original question so hopefully it makes more sense now — WibeMan, Jun 11 '20 at 12:30

score 3 · Accepted Answer · answered Jun 11 '20 at 12:35

The general procedure is just to pre-define a matrix of zeroes with the correct dimensions, then fill it one element at a time. Don't overthink this kind of task.

For example, if you know that you have exactly 8 states, you can construct the matrix like this, using your frequencies dict:

import numpy as np

n_states = 8
transitions = np.zeroes((n_states, n_states), dtype=np.float)

for (state1, state2), probability in frequencies.items():
    transitions[state1, state2] = probability

For a very large number of states, this could take a while, depending on how fast your computer is.

If you do not know the total number of states, you can estimate it by computing the largest state number in your data:

from itertools import chain

n_states = max(chain.from_iterable(frequencies.keys()))

How to create a transition matrix from dictionary with transitions and counts

1 Answers1