How to parse input characters when using DiscreteDistribution

Asked Jul 10 '17 at 18:42

Active Jul 24 '17 at 19:30

Viewed 174 times

I would like to use Hidden Markov Model implementing by Pomegranate(a python API https://pomegranate.readthedocs.io/en/latest/index.html) and I would like to initialize my Markov model by specifying a discrete distribution.

Since it is discrete, when I fit the learned model using new data(of string datatype), I may have encountered some characters that is not appeared in the distributions of my learned model. So is there a way I could 'parse' my input/distribution so anything that is not in my 'learned' distribution is classified into a new group with assigned probability ?

For example, I may want to define a discrete distribution like this to avoid the problem:

d1 = DiscreteDistribution({'A' : 0.35, 'B' : 0.20, 'C' : 0.05, 'the-rest-of-char' : 0.40})

So basically how can I defined something like regular expression when using the discrete distribution to the HMM ??

Any help is appreciated !!

edited Jul 24 '17 at 19:30

Wiktor Stribiżew

607,720
39
448
563

asked Jul 10 '17 at 18:42

starry1990

How to parse input characters when using DiscreteDistribution

0 Answers0