How do I use the corpus I have created in python?

Question

I have made a corpus abc. And I am unable to upload it in python

The problems I am facing:

1) Should I place self-build corpus in the location where all the pre-build corpus are?

1.a) If so why am i not able to use this commands: (Let say the location is 'LOCATION')

abc = nltk.data.find('LOCATION\abc')

1.b) In fact,

 from nltk import abc

is throwing this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name abc

2) What are the other ways I can upload the corpus I have created?

Traceback (most recent call last): File "", line 1, in ImportError: cannot import name abc — user3771993, Mar 27 '16 at 16:14
Mainly, want to know how to upload corpus that I have created — user3771993, Mar 27 '16 at 16:16
Please update the question with such explanations. It will attract more possible helpers when they see something supposed to work but failing on a easy-to-identify location in the code so that it's a consulting business, not puzzling... — flaschbier, Mar 27 '16 at 16:19
If you want to contribute a corpus to NLTK, please take a look at https://github.com/nltk/nltk/wiki/Adding-a-Corpus. If you would like to create a new corpus API using NLTK objects/functions, see http://stackoverflow.com/questions/4951751/creating-a-new-corpus-with-nltk — alvas, Mar 27 '16 at 23:35

score 0 · Answer 1 · edited May 23 '17 at 12:15

I think you're looking for the first or the second answer of this other question.

Anyway, this is a quick way to do it:

import nltk
from nltk.corpus import PlaintextCorpusReader

corpus_root = './'
newcorpus = PlaintextCorpusReader(corpus_root, '.*') # Files you want to add
newcorpus.words('file-1.txt')

And no, putting your own corpus in the nltk's data directory does not seem a brilliant idea. Not for particular reasons, just to keep your data separate from what is included in the toolkit.

How do I use the corpus I have created in python?

1 Answers1