0

(Python 3.5) I've got a weird error, because of the "pos = nltk.pos_tag(words)" when I run the code. I've this new problem after fixing the old one with downgrading to nltk 3.1 after few days i've tried to launch the program again but nothing happened the program is running with no result he stacked at the nltk.pos_tag() but it's not showing me any errors,until i decide to close the program i didn't had this problem before and i don't have any idea what's it i've tried to change almost everything in the loop of the tagging speech but it's always the same error

import nltk
import random
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from nltk.classify import ClassifierI
from statistics import mode
from nltk.tokenize import word_tokenize

class VoteClassifier(ClassifierI):
    def __init__(self, *classifiers):
        self._classifiers = classifiers

    def classify(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)
        return mode(votes)

    def confidence(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)

        choice_votes = votes.count(mode(votes))
        conf = choice_votes / len(votes)
        return conf

short_pos = open("short_reviews/positive.txt","r").read()
short_neg = open("short_reviews/negative.txt","r").read()

all_words = []
documents = []

allowed_word_types = ["J"]

for p in short_pos.split('\n'):
    documents.append( (p, "pos") )
    words = word_tokenize(p)
    pos = nltk.pos_tag(words)
    for w in pos:
        if w[1][0] in allowed_word_types:
            all_words.append(w[0].lower())

for p in short_neg.split('\n'):
    documents.append( (p, "neg") )
    words = word_tokenize(p)
    pos = nltk.pos_tag(words)
    for w in pos:
        if w[1][0] in allowed_word_types:
            all_words.append(w[0].lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:5000]

please if anyone had any clue, what that's causing the problem; i will be very appreciated for that. i am struggling for over a week for this problem thanks in advance.

Stack Over
  • 17
  • 4
  • 3
    Not sure if it's a version issue but is [this](http://stackoverflow.com/questions/35827859/python-nltk-pos-tag-throws-urlerror) of any help? – avip Mar 12 '16 at 07:52
  • thanks a lot this is really solved my solution ,who would've think of that downgrade the nltk version to nltk 3.1 ,i wouldn't think of that in a million years. – Stack Over Mar 12 '16 at 18:26
  • Very welcome, glad it helped. FWIW, I had the same problem and a downgrade also helped :) – avip Mar 12 '16 at 19:02
  • Voting to close as a duplicate of the question found by @avip. – alexis Mar 12 '16 at 20:58
  • Hey @avip for some reason the downgrade or even the upgrade of the nltk is no longer working, when i run the previous program for now i got nothing the program not stop running and nothing happened, On the contrary of the first time why is that ? – Stack Over Mar 18 '16 at 19:38

1 Answers1

1

See NLTK v3.2: Unable to nltk.pos_tag()


Without downgrading to NLTK v3.1, using NLTK 3.2, you can use this "hack":

>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]
Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738