0

I have made a corpus abc. And I am unable to upload it in python

The problems I am facing:

1) Should I place self-build corpus in the location where all the pre-build corpus are?

1.a) If so why am i not able to use this commands: (Let say the location is 'LOCATION')

abc = nltk.data.find('LOCATION\abc')

1.b) In fact,

 from nltk import abc

is throwing this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name abc

2) What are the other ways I can upload the corpus I have created?

  • Please outline what "are not working" is supposed to mean. – flaschbier Mar 27 '16 at 16:12
  • Traceback (most recent call last): File "", line 1, in ImportError: cannot import name abc – user3771993 Mar 27 '16 at 16:14
  • Mainly, want to know how to upload corpus that I have created – user3771993 Mar 27 '16 at 16:16
  • Please update the question with such explanations. It will attract more possible helpers when they see something supposed to work but failing on a easy-to-identify location in the code so that it's a consulting business, not puzzling... – flaschbier Mar 27 '16 at 16:19
  • If you want to contribute a corpus to NLTK, please take a look at https://github.com/nltk/nltk/wiki/Adding-a-Corpus. If you would like to create a new corpus API using NLTK objects/functions, see http://stackoverflow.com/questions/4951751/creating-a-new-corpus-with-nltk – alvas Mar 27 '16 at 23:35

1 Answers1

0

I think you're looking for the first or the second answer of this other question.

Anyway, this is a quick way to do it:

import nltk
from nltk.corpus import PlaintextCorpusReader

corpus_root = './'
newcorpus = PlaintextCorpusReader(corpus_root, '.*') # Files you want to add
newcorpus.words('file-1.txt')

And no, putting your own corpus in the nltk's data directory does not seem a brilliant idea. Not for particular reasons, just to keep your data separate from what is included in the toolkit.

Community
  • 1
  • 1
Alex
  • 6,849
  • 6
  • 19
  • 36