3

I was trying to save HMM Tagger of NLTK as follows, with Pickle. But it is giving me error as follows, Please suggest me a solution.

>>> import nltk
>>> import pickle
>>> brown_a = nltk.corpus.brown.tagged_sents()[:300]
>>> hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
>>> sent = nltk.corpus.brown.sents()[400]
>>> hmm_tagger.tag(sent)
[(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
>>> f = open('my_tagger.pickle', 'wb')
>>> pickle.dump(hmm_tagger, f)

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    pickle.dump(hmm_tagger, f)
  File "C:\Python27\lib\pickle.py", line 1376, in dump
    Pickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 669, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function estimator at 0x0575F6F0>: it's not found as nltk.tag.hmm.estimator
>>> 

I am on Python2.7.11 with NLTK3.1 on MS-Windows10.

Thanks in Advance.

Sergei Lebedev
  • 2,659
  • 20
  • 23
Coeus2016
  • 355
  • 4
  • 14

1 Answers1

0

Why do you want to pickle the model? Training on the brown corpus is extremely fast. And if you want a better part-of-speech tagger, consider looking at https://spacy.io/ which is easy to use in Python has great pickling support and produces state-of-the art results. Indeed, HMM taggers are really bad nowadays.

Anyway, this is a NLTK bug. Three options:

  1. Report the bug to NLTK and/or fix it by moving the estimator function outside of the _train function to put in the module (so that pickle can find it in nltk.tag.hmm.estimator
  2. Provide your own estimator function so that pickle finds it in your own module
  3. Use a pickle alternative such as dill or cloudpickle: they may be able to handle this estimator function.

Here's how to dump your tagger using dill:

import nltk
import dill

brown_a = nltk.corpus.brown.tagged_sents()[:300]
hmm_tagger=nltk.HiddenMarkovModelTagger.train(brown_a)
sent = nltk.corpus.brown.sents()[400]
hmm_tagger.tag(sent)
# [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]

with open('my_tagger.dill', 'wb') as f:
    dill.dump(hmm_tagger, f)

Now you can load the tagger:

import dill

with open('my_tagger.dill', 'rb') as f:
    hmm_tagger = dill.load(f)

hmm_tagger.tag(sent)
# [(u'He', u'PPS'), (u'is', u'BEZ'), (u'not', u'*'), (u'interested', u'VBN'), (u'in', u'IN'), (u'being', u'NN'), (u'named', u'IN'), (u'a', u'AT'), (u'full-time', u'JJ'), (u'director', u'NN'), (u'.', u'.')]
Quentin Pradet
  • 4,691
  • 2
  • 29
  • 41