Python: passing multiple LARGE sequences through hmmlearn

Asked Oct 27 '16 at 21:41

Active Oct 27 '16 at 21:41

Viewed 453 times

So I am trying to train a very large feature array for hidden markov model: 700 x (400 x 4122), where each 400x4122 mini-array is a sequence of observed samples across 400 time stamps with 4122 features. There is a total of 700 such sequences, which amounts to ~45GB of memory, when concatenated. My question is: how do you work with array of this size?

In the hmmlearn python package, one typically work with multiple sequences as follows:

x1 -> a 400x4122 sequence

x2 -> another 400x4122 sequence

...

xn -> 700th 400x4122 sequence

X = np.concatenate(x1, x2, ..., xn)

lengths = [len(x1), len(x2),..., len(xn)]

model = GaussianHMM(n_component = 6, ...).fit(X, length = lengths)

In other words, one needs to concatenate the entire array of sequences and feed into the training function. However, I was wondering if there is a way to feed one 400x4122 sequence at a time, as the entire concatenated array is way too large to work with.

Thanks in advance.

asked Oct 27 '16 at 21:41

Andy

Either sparse matrices or `memmap` will do the trick. – Francisco Oct 27 '16 at 21:44
1

`hmmlearn` does not currently support sparse `X`. Feel free to submit an issue, if you think this might be useful: http://github.com/hmmlearn/hmmlearn – Sergei Lebedev Nov 03 '16 at 09:55

Python: passing multiple LARGE sequences through hmmlearn

0 Answers0

Linked