I would like to use Hidden Markov Model implementing by Pomegranate(a python API https://pomegranate.readthedocs.io/en/latest/index.html) and I would like to initialize my Markov model by specifying a discrete distribution.
Since it is discrete, when I fit the learned model using new data(of string datatype), I may have encountered some characters that is not appeared in the distributions of my learned model. So is there a way I could 'parse' my input/distribution so anything that is not in my 'learned' distribution is classified into a new group with assigned probability ?
For example, I may want to define a discrete distribution like this to avoid the problem:
d1 = DiscreteDistribution({'A' : 0.35, 'B' : 0.20, 'C' : 0.05, 'the-rest-of-char' : 0.40})
So basically how can I defined something like regular expression when using the discrete distribution to the HMM ??
Any help is appreciated !!