4

I want to implement a part-of-speech tagger,but I don't know where I can get a lot of training data? Thanks!

tianzhi0549
  • 479
  • 2
  • 5
  • 12

2 Answers2

5

There's a training set and testing set from the chunking shared task of the CoNLL-2000 conference here:

http://www.cnts.ua.ac.be/conll2000/chunking/

Others have used this to train part-of-speech taggers:

https://code.google.com/p/miralium/wiki/PosTaggerTutorial

3

https://catalog.ldc.upenn.edu/LDC99T42 <--- They want $1700.00 or $850.00 if you have a Reduced-License :-(

https://www.kaggle.com/nltkdata/penn-tree-bank <--- You gotta love Kaggle!

https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/version/4 <--- You gotta love Kaggle even more!

Rusty Nail
  • 2,692
  • 3
  • 34
  • 55