I want to implement a part-of-speech tagger,but I don't know where I can get a lot of training data? Thanks!
Asked
Active
Viewed 6,308 times
4
-
https://www.google.com/search?q=pos+corpus – tripleee Aug 16 '14 at 13:36
2 Answers
5
There's a training set and testing set from the chunking shared task of the CoNLL-2000 conference here:
http://www.cnts.ua.ac.be/conll2000/chunking/
Others have used this to train part-of-speech taggers:
-
1The link has changed to https://www.clips.uantwerpen.be/conll2000/chunking/ – Rouzbeh Sep 11 '17 at 17:58
3
https://catalog.ldc.upenn.edu/LDC99T42 <--- They want $1700.00 or $850.00 if you have a Reduced-License :-(
https://www.kaggle.com/nltkdata/penn-tree-bank <--- You gotta love Kaggle!
https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/version/4 <--- You gotta love Kaggle even more!

Rusty Nail
- 2,692
- 3
- 34
- 55