2

I am trying to train HMM model to find model parameters for Part of Speech tagging problem.

I am using PythonHMM package from following resource: https://github.com/jason2506/PythonHMM

Original training data could be like this:

Sr.No.  Observations
1       killer/N clown/N
2       killer/N problem/N
3       crazy/A problem/N
4       crazy/A clown/N
5       problem/N crazy/A clown/N
6       clown/N crazy/A killer/N

I have created a list of each sequence (a list of (state list, symbol list) pair) from original data, as instructed to use for train model through PythonHMM. It looks like this:

sequences = [
                 (['N','N'],['killer','clown']),
                 (['N','N'],['killer','problem']),
                 (['A','N'],['crazy','problem']),
                 (['A','N'],['crazy','clown']),
                 (['N','A','N'],['problem','crazy','clown']),
                 (['N','A','N'],['clown','crazy','killer'])
]

I am calling 'train' function of hmm (after imported hmm.py)

model_hmm = hmm.train(sequences)

then I am getting following error:


ValueError                                Traceback (most recent call last)
<ipython-input-41-24d7c607e58c> in <module>()
----> 1 model_hmm = hmm.train(sequences)

/home/sk/hmm.py in train(sequences, delta, smoothing)
     95         for _, symbol_list in sequences:
     96             model.learn(symbol_list, smoothing)
---> 97             new_likelihood += log(model.evaluate(symbol_list))
     98 
     99         new_likelihood /= length

ValueError: math domain error

I could not able to figure out why this error comes, Is there any problem in passing sequences data to train function or something else??

I also didn't find any example for training of HMM model for such type of problem. Please help me to resolve this error.

2 Answers2

3

The hmmlearn implementation already support train HMM with multiple sequences of observations. see train hmm with multiple sequences

Wenmin Wu
  • 1,808
  • 12
  • 24
0

nltk library has HMM model which does exactly what you are trying to do.

see the following link for better understanding: https://gist.github.com/blumonkey/007955ec2f67119e0909

Pramod Munemanik
  • 281
  • 3
  • 10
  • thanks for your suggetion bro.. but i want to explore HMM for other information extraction problem also like Adress segmentation, Resume parsing etc. , thats why I was trying to implement this... – user6568159 Jan 11 '18 at 10:51
  • Refer this : https://arxiv.org/pdf/1603.01360.pdf . LSTM-CRF model is very good for sequential tagging. – Pramod Munemanik Jan 11 '18 at 12:01