How can I load my txt corpora using NLTK?

Asked Oct 22 '17 at 15:16

Active Oct 22 '17 at 15:19

Viewed 56 times

I'm really new to python and I'm experiencing some issues related to loading my corpus with NLTK. My version is 3.6.3 and I installed NLTK using pip install,I have tested NLTK with short sentences and it has worked so far, but I cannot use my own .txt corpora that is stored in my pc. I have tried something like this:

import codecs
import nltk

text = codecs.open('myfilename.txt','r','utf-8')

But then I get all kinds of error. I think it has something to do with my file location but I can't find an nltk_data anywhere. Any help will be greatly appreciated. Thanks.

edited Oct 22 '17 at 15:19

cs95

379,657
97
704
746

asked Oct 22 '17 at 15:16

sebastian diaz

you should describe your `all kinds of errors`. but in general you are going about write, once opening the file you can use `import nltk; nltk.Text(text.split())` – Arpit Goyal Oct 22 '17 at 15:50
Your code snippet usefully shows that the problem is unrelated to the nltk; the file you are trying to read is not UTF-8 encoded. Try "latin-1"? (Better yet: Find out the correct encoding, and use that.) – alexis Oct 22 '17 at 19:10
After figuring where your file is exactly and its encoding, take a look at https://stackoverflow.com/questions/4951751/creating-a-new-corpus-with-nltk – alvas Oct 22 '17 at 23:29

How can I load my txt corpora using NLTK?

0 Answers0