1

I am new to NLTK and trying to return the collocation output. I am getting the output and along with it, I am getting none as well. Below is my code, input and output.

import nltk
from nltk.corpus import stopwords


def performBigramsAndCollocations(textcontent, word):
    stop_words = set(stopwords.words('english'))
    pattern = r'\w+'
    tokenizedwords = nltk.regexp_tokenize(textcontent, pattern)
    for i in range(len(tokenizedwords)):
        tokenizedwords[i] = tokenizedwords[i].lower()
    tokenizedwordsbigrams = nltk.bigrams(tokenizedwords)
    tokenizednonstopwordsbigrams = [ (w1, w2) for w1, w2 in tokenizedwordsbigrams if w1 not in stop_words and w2 not in stop_words]
    cfd_bigrams = nltk.ConditionalFreqDist(tokenizednonstopwordsbigrams)
    mostfrequentwordafter = cfd_bigrams[word].most_common(3)
    tokenizedwords = nltk.Text(tokenizedwords)
    collocationwords = tokenizedwords.collocations()
    return mostfrequentwordafter, collocationwords


if __name__ == '__main__':
    textcontent = input()

    word = input()


    mostfrequentwordafter, collocationwords = performBigramsAndCollocations(textcontent, word)
    print(sorted(mostfrequentwordafter, key=lambda element: (element[1], element[0]), reverse=True))
    print(sorted(collocationwords))

input :Thirty-five sports disciplines and four cultural activities will be offered during seven days of competitions. He skated with charisma, changing from one gear to another, from one direction to another, faster than a sports car. Armchair sports fans settling down to watch the Olympic Games could be for the high jump if they do not pay their TV licence fee. Such invitationals will attract more viewership for sports fans by sparking interest among sports fans. She barely noticed a flashy sports car almost run them over, until Eddie lunged forward and grabbed her body away. And he flatters the mother and she kind of gets prissy and he talks her into going for a ride in the sports car.

sports

output:
sports car; sports fans.

[('fans', 3), ('car', 3), ('disciplines', 1)]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-191-40624b3de987> in <module>
     43     mostfrequentwordafter, collocationwords = performBigramsAndCollocations(textcontent, word)
     44     print(sorted(mostfrequentwordafter, key=lambda element: (element[1], element[0]), reverse=True))
---> 45     print(sorted(collocationwords))

TypeError: 'NoneType' object is not iterable

Can you please help me to resolve the issue

Seaver Olson
  • 450
  • 3
  • 16
sk2020
  • 11
  • 3
  • I think the error is on line 44 when you use lambda. can you try running this instead and tell me the output. `print(sorted(mostfrequentwordafter, key=lambda element: ((element[1], element[0]), reverse=True)))` – Seaver Olson Aug 09 '20 at 03:08
  • It is giving me syntax error with this statement – sk2020 Aug 09 '20 at 17:35

5 Answers5

1

collocations() is buggy and causing error in nltk. I faced the issue recently and able to resolve the issue by using collocation_list(). Try this approach.

collocationwords = tokenizedwords.collocation_list()
1

Use the below code it should work.

def performBigramsAndCollocations(textcontent, word):
    
    from nltk.corpus import stopwords
    from nltk import ConditionalFreqDist
    tokenizedword = nltk.regexp_tokenize(textcontent, pattern = r'\w*', gaps = False)
    tokenizedwords = [x.lower() for x in tokenizedword if x != '']
    tokenizedwordsbigrams=nltk.bigrams(tokenizedwords)
    stop_words= stopwords.words('english')
    tokenizednonstopwordsbigrams=[(w1,w2) for w1 , w2 in tokenizedwordsbigrams if (w1 not in stop_words and w2 not in stop_words)]
    cfd_bigrams=nltk.ConditionalFreqDist(tokenizednonstopwordsbigrams)
    mostfrequentwordafter=cfd_bigrams[word].most_common(3)
    tokenizedwords = nltk.Text(tokenizedwords)
    collocationwords = tokenizedwords.collocation_list()

    return mostfrequentwordafter ,collocationwords
    
harshad_
  • 11
  • 2
1

collocation_list() alone was not helping. I tried the below and it worked for me.

collocationwords1 = tokenizedwords.collocation_list()

collocationwords=list()
for item in collocationwords1:
    newitem=item[0]+" "+item[1]
    collocationwords.append(newitem)
Peter Csala
  • 17,736
  • 16
  • 35
  • 75
Suchitra U
  • 11
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 17 '22 at 19:27
0

def performBigramsAndCollocations(textcontent, word):

from nltk.corpus import stopwords
from nltk import ConditionalFreqDist
tokenizedword = nltk.regexp_tokenize(textcontent, pattern = r'\w*', gaps =False)
tokenizedwords = [x.lower() for x in tokenizedword if x != '']
tokenizedwordsbigrams=nltk.bigrams(tokenizedwords)
stop_words= stopwords.words('english')
tokenizednonstopwordsbigrams=[(w1,w2) for w1 , w2 in tokenizedwordsbigrams if (w1 not in stop_words and w2 not in stop_words)]
cfd_bigrams=nltk.ConditionalFreqDist(tokenizednonstopwordsbigrams)
mostfrequentwordafter=cfd_bigrams[word].most_common(3)
tokenizedwords = nltk.Text(tokenizedwords)
collocationwords1 = tokenizedwords.collocation_list()

collocationwords=list()
for item in collocationwords1:
    newitem=item[0]+" "+item[1]
    collocationwords.append(newitem)


return mostfrequentwordafter ,collocationwords

##this code worked for me

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – user11717481 Sep 17 '22 at 20:36
-1

key transforms the collections items before it is ran. key= really means as I run through this list I will- so when you use key=lambda element: (element[1], element[0]) you are asking it to run twice. instead try something like this. Note that this may not be exactly correct as it is 7 am and I just woke up I will edit it later if it does not work for you.

mylist = [0,1]
print(sorted(mostfrequentwordafter, key=lambda element: (element[mylist]), reverse=True))
Seaver Olson
  • 450
  • 3
  • 16
  • if this does not work please take a look at `https://stackoverflow.com/questions/8966538/syntax-behind-sortedkey-lambda` – Seaver Olson Aug 09 '20 at 19:24
  • It doesn’t make sense to offer a solution and then say that “well, if it doesn’t work, try this one instead”. If you’re not sure, don’t throw pasta at the wall. – Abhijit Sarkar Aug 09 '20 at 19:32