1

I try to use NLTK with the folowing code conll2002, using the instructions from

How to improve dutch NER chunkers in NLTK

i have run the following command under the directory where i have unpacked NLTK-Trainer.

python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename /nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

I found the picle file (conll2002_ned_NaiveBayes.pickle) and copied the chunker file the directory (C:\Users\Administrator\AppData\Roaming\nltk_data\chunkers). This is where the NLTK.download also download the packages.

and try te execute the following code:

import nltk

from nltk.corpus import conll2002

tokenizer = nltk.data.load('tokenizers/punkt/dutch.pickle')
tagger = nltk.data.load('taggers/conll2002_ned_IIS.pickle')
chunker = nltk.data.load('chunkers/conll2002_ned_NaiveBayes.pickle')

test_sents = conll2002.tagged_sents(fileids="ned.testb")[0:1000]

print "tagger accuracy on test-set: " + str(tagger.evaluate(test_sents))

test_sents = conll2002.chunked_sents(fileids="ned.testb")[0:1000]

print chunker.evaluate(test_sents)

But after running this code i get the following error:

LookupError: Resource u'taggers/conll2002_ned_IIS.pickle' not found. Please ....

I have tried to dowload all the packages and models with NLTK.download() GUI but i still get the same error

Has anyone an idea how to solve this problem? Many Thanks

Erik

Community
  • 1
  • 1
Erik hoeven
  • 1,442
  • 6
  • 26
  • 41
  • have you run `python train_chunker.py conll2002 ...` command as specified in the linked question? – jfs Jan 11 '15 at 19:02
  • Sebastian, i have run the following command under the directory where i have unpacked NLTK-Trainer. python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename /nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle. But stil get the error. Resource u'taggers/dutch.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: – Erik hoeven Jan 14 '15 at 16:56
  • If the command creates pickle files; make sure to copy them into corresponding subdirectories of nltk_data directory – jfs Jan 15 '15 at 14:13
  • Sebastian, Yes i did! See update in the question. But still get the same message. – Erik hoeven Jan 15 '15 at 15:53

1 Answers1

1

You have to train both the tagger AND the chunker...

python train_chunker.py conll2002 --fileids ned.train --classifier NaiveBayes --filename ~/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

This gives:

loading conll2002
using chunked sentences from ned.train
15806 chunks, training on 15806
training ClassifierChunker with ['NaiveBayes'] classifier
Constructing training corpus for classifier.
Training classifier (202644 instances)
training NaiveBayes classifier
evaluating ClassifierChunker
ChunkParse score:
    IOB Accuracy:  95.4%
    Precision:     66.9%
    Recall:        71.9%
    F-Measure:     69.3%
dumping ClassifierChunker to /home/hugo/nltk_data/chunkers/conll2002_ned_NaiveBayes.pickle

And now train the tagger:

python train_tagger.py conll2002 --fileids ned.train --classifier IIS --filename ~/nltk_data/chunkers/conll2002_ned_IIS.pickle

Which gives:

loading conll2002
using tagged sentences from ned.train
15806 tagged sents, training on 15806
training AffixTagger with affix -3 and backoff <DefaultTagger: tag=-None->
training <class 'nltk.tag.sequential.UnigramTagger'> tagger with backoff <AffixTagger: size=3988>
training <class 'nltk.tag.sequential.BigramTagger'> tagger with backoff <UnigramTagger: size=7799>
training <class 'nltk.tag.sequential.TrigramTagger'> tagger with backoff <BigramTagger: size=1451>
training ['IIS'] ClassifierBasedPOSTagger
Constructing training corpus for classifier.
Training classifier (202644 instances)
training IIS classifier
  ==> Training (10 iterations)
evaluating ClassifierBasedPOSTagger
accuracy: 0.980666
dumping ClassifierBasedPOSTagger to /home/hugo/nltk_data/chunkers/conll2002_ned_IIS.pickle

This takes some time... now you should be good to go ...

Hugo Koopmans
  • 1,349
  • 1
  • 15
  • 27