3

I am trying to use multi threading to speed up the process. I am using the wordnetlemmatizer to lemmatize the words and those words can be further used by sentiwordnet to calculate the sentiment of the text. My Sentiment analysis function where I am using the WordNetLemmatizer is as follows:

import nltk
from nltk.corpus import sentiwordnet as swn

def SentimentA(doc, file_path):
    sentences = nltk.sent_tokenize(doc)
    # print(sentences)
    stokens = [nltk.word_tokenize(sent) for sent in sentences]
    taggedlist = []
    for stoken in stokens:
        taggedlist.append(nltk.pos_tag(stoken))
    wnl = nltk.WordNetLemmatizer()
    score_list = []
    for idx, taggedsent in enumerate(taggedlist):
        score_list.append([])
        for idx2, t in enumerate(taggedsent):
            newtag = ''
            lemmatized = wnl.lemmatize(t[0])
            if t[1].startswith('NN'):
                newtag = 'n'
            elif t[1].startswith('JJ'):
                newtag = 'a'
            elif t[1].startswith('V'):
                newtag = 'v'
            elif t[1].startswith('R'):
                newtag = 'r'
            else:
                newtag = ''
            if (newtag != ''):
                synsets = list(swn.senti_synsets(lemmatized, newtag))

                score = 0
                if (len(synsets) > 0):
                    for syn in synsets:
                        score += syn.pos_score() - syn.neg_score()
                    score_list[idx].append(score / len(synsets))
    return SentiCal(score_list)

After running 4 threads, I am getting the following error for the first 3 threads and the last thread is working perfectly.

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

I have already tried importing the NLTK package locally as given in this NLTK issue and tried the solution given on this page.

  • `LazyCorpusLoader` should evaluate before Pool =) I'll answer today ~10hrs later if no one answers. – alvas May 31 '18 at 00:01

1 Answers1

3

Quick hack:

import nltk
from nltk.corpus import sentiwordnet as swn
# Do this first, that'll do something eval() 
# to "materialize" the LazyCorpusLoader
next(swn.all_senti_synsets()) 

# Your other code here. 

More details later... Still typing

alvas
  • 115,346
  • 109
  • 446
  • 738
  • when I am using the above hint, I am getting the error: `AttributeError: 'SentiWordNetCorpusReader' object has no attribute 'words'`. But, WordNet does has a Words attribute, which I used the same way you've shown above. I am still getting the same Error. Strangely, both the threads run, when I printed the `lemmatized` variable in my code. – Shivansh bhandari May 31 '18 at 01:10
  • Sorry it should be `next(swn.all_senti_synsets())` for sentiwordnet and `next(wn.words())` for wordnet. – alvas May 31 '18 at 01:39
  • When I am increasing the number of threads and running it on a larger number of documents, some of the threads are still stopping because of the same error. Could you tell me the work around on this one? – Shivansh bhandari Jun 01 '18 at 09:42
  • Remember that Python's multithread is sort of fake and serialized outputs are passed around. It's not easy to debug what's wrong unless you share the full code and dataset and someone else try to replicate the problem. – alvas Jun 02 '18 at 05:53