Add a corpus to the NLTK corpus, and importing it

Asked Nov 13 '16 at 23:33

Active Nov 13 '16 at 23:33

Viewed 498 times

I have created a corpus consisting of a collection of .txt files, and would like to start using the NLTK (python) on them.

I have navigated to the /nltk_data/corpora/ where NLTK comes with some pre-defined corpora (gutenberg, shakespear, etc.) to be tested by the platform.

However, when I attempted to add my own file (containing the several .txt files), and import it using

from nltk.corpus import [name of my file]

I receive this message back:

Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> from nltk.corpus import [name of my file] ImportError: cannot import name '[name of my file]'

What am I doing wrong? Also, what's the method of calling in my file from any other place?

Thanks.

asked Nov 13 '16 at 23:33

Newman

You don't add new corpora in `nltk_data`. See the code in the linked question's text for how to set up your own corpus. – alexis Nov 14 '16 at 21:15
@alexis Thanks for the link, once I've done this do you have any pointers on how I can use the NLTK functions with my new corpus? It's not as easy or straightforward as with the built-in corpora. – Newman Nov 14 '16 at 23:56
Look at the snippet in the question. Once you've created a `PlaintextCorpusReader` object, you can use the methods `fileids()`, `sents()`, and `words()` just like with the built-in objects. – alexis Nov 15 '16 at 00:11
@alexis Yeah got it working, thanks for your help! – Newman Nov 15 '16 at 00:13
Create PlaintextCorpusReader and pass the corpus_root to it. Src: Section 1.9 in http://www.nltk.org/book/ch02.html – Sisay Chala Jun 02 '17 at 07:47

Add a corpus to the NLTK corpus, and importing it

0 Answers0