I am using NLTK 3.0
with Python 3.4 and cannot do POS tagging because of the following error:
I have read all similar posts related to similar problems, but could not find a way to solve the problem. most of the posts mention that upgrading to NLTK 3.0
will solve the problem but I already have NLTK 3.0
. According to these posts a change in the nltk's data.py
solves the problem but NLTK
people discourage doing that.
Here is my code:
from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
pos_tag(word_tokenize("John's big idea isn't all that bad."))
and here is the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0: ordinal not in range(128)
Is there any way to do it without manipulating data.py
?
Any idea would be appreciated.