1

So I am trying to train a very large feature array for hidden markov model: 700 x (400 x 4122), where each 400x4122 mini-array is a sequence of observed samples across 400 time stamps with 4122 features. There is a total of 700 such sequences, which amounts to ~45GB of memory, when concatenated. My question is: how do you work with array of this size?

In the hmmlearn python package, one typically work with multiple sequences as follows:

x1 -> a 400x4122 sequence

x2 -> another 400x4122 sequence

...

xn -> 700th 400x4122 sequence

X = np.concatenate(x1, x2, ..., xn)

lengths = [len(x1), len(x2),..., len(xn)]

model = GaussianHMM(n_component = 6, ...).fit(X, length = lengths)

In other words, one needs to concatenate the entire array of sequences and feed into the training function. However, I was wondering if there is a way to feed one 400x4122 sequence at a time, as the entire concatenated array is way too large to work with.

Thanks in advance.

Andy
  • 175
  • 1
  • 7

0 Answers0